[r/LocalLLaMA]score: 0.12

Qwen 3.5 122B Unsloth Quantization performs at 30 tokens per second

June 30, 2026

An unsloth-optimized Qwen 3.5 122B model (UD-IQ4_NL) achieves 30 tokens per second on 64GB VRAM hardware. The configuration supports a 100k bf16 context window by offloading select layers to CPU/RAM.

HOW THIS AFFECTS YOU

●

builderYou can run large-scale coding models locally on consumer-grade workstation hardware with high throughput.

●

researcherThis demonstrates the efficacy of Unsloth quantizations for maintaining performance in large-scale context windows.

read original ↗reddit.com

← back to feed