[r/LocalLLaMA]score: 0.19

Qwen 35B-A3B is very usable with 12GB of VRAM

May 8, 2026

Community testing confirms Qwen3-35B-A3B (IQ4_XS GGUF, MoE architecture, ~3B active params) runs usably on a 12GB RTX 3060 via llama.cpp. Key lever is the -ncmoe flag controlling MoE block offloading; tuning it enables 16k-32k context with solid decode throughput. Relevant for anyone running large MoE models on consumer hardware.

resources

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1t7l56a/qwen_35ba3b_is_very_usable_with_12gb_of_vram/

← back to feed