[r/LocalLLaMA]score: 0.14

Anyone want to try my llama.cpp DeepSeek V3.2 PR?

May 6, 2026

Community contributor fairydreaming has opened llama.cpp PR #21149 adding DeepSeekV3.2 inference support with Sparse Attention (DSA), spanning 34 files and +2319 lines of new code across 80 commits. Quantized GGUFs are available at Q4_K_M (~404GB) and Q8_0 (~714GB). Local inference practitioners running multi-GPU CUDA setups should test immediately, adjusting ubatch size to avoid ggml_top_k() OOM errors. This extends llama.cpp's existing DeepSeek MoE support to V3.2's sparse attention architecture, pending 26 open reviews before potential merge.

other

SOURCE

https://github.com/ggml-org/llama.cpp/pull/21149

← back to feed