[r/LocalLLaMA]score: 0.25

Quantisation effects of Qwen3.6 35b a3b

April 25, 2026

**Quantization Sensitivity in Qwen3 235B-A22B (35B Active Parameters)** A practitioner reports noticeable quality degradation when running Qwen3 235B-A22B at Q4\_K\_XL versus Q8 quantization on 48GB VRAM, with subjective improvements in tool calling reliability and instruction-following nuance at the higher precision. The observation aligns with a reasonable hypothesis that MoE architectures with small active parameter counts (roughly 22B active out of 235B total) may be disproportionately sensitive to quantization, since fewer weights are engaged per forward pass, making per-weight precision more consequential. This is an informal vibe test rather than a controlled benchmark, but it raises a practical tradeoff worth investigating for anyone deploying this model: the VRAM delta between Q4\_K\_XL and Q8 is substantial, and Q6\_K\_XL may represent a useful middle ground worth systematic evaluation.

discussion

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1sv7crl/quantisation_effects_of_qwen36_35b_a3b/

← back to feed