[r/LocalLLaMA]score: 0.20
I have DeepSeek V4 Pro at home
May 10, 2026
A practitioner successfully ran DeepSeek V4 Pro quantized to Q4_K_M on a local EPYC workstation with 12x96GB RAM and a single RTX PRO 6000 Max-Q GPU, using a community-forked llama.cpp with CUDA and flash-attention support. This demonstrates that frontier-scale MoE models are now locally runnable with sufficient CPU RAM offloading.
other