Qwen 3.5 122B Unsloth Quantization performs at 30 tokens per second
June 30, 2026
An unsloth-optimized Qwen 3.5 122B model (UD-IQ4_NL) achieves 30 tokens per second on 64GB VRAM hardware. The configuration supports a 100k bf16 context window by offloading select layers to CPU/RAM.
HOW THIS AFFECTS YOU
●
builderYou can run large-scale coding models locally on consumer-grade workstation hardware with high throughput.
●
researcherThis demonstrates the efficacy of Unsloth quantizations for maintaining performance in large-scale context windows.