HACKOBAR_item
[r/LocalLLaMA]score: 0.19

Qwen 35B-A3B is very usable with 12GB of VRAM

May 8, 2026
Community testing confirms Qwen3-35B-A3B (IQ4_XS GGUF, MoE architecture, ~3B active params) runs usably on a 12GB RTX 3060 via llama.cpp. Key lever is the -ncmoe flag controlling MoE block offloading; tuning it enables 16k-32k context with solid decode throughput. Relevant for anyone running large MoE models on consumer hardware.
resources