HACKOBAR_item
[r/LocalLLaMA]score: 0.18

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

May 8, 2026
A practitioner achieved 80-87 t/s on Qwen3.6-27B at 262K context on a single RTX 4090 24GB by combining Multi-Token Prediction (MTP) with TurboQuant's TBQ4_0 lossless 4.25 bpw KV cache quantization. The Q4_K_M model uses grafted MTP heads with 73% draft acceptance rate, nearly doubling throughput from a baseline 43 t/s. Long-context inference on consumer hardware just got a serious upgrade worth replicating.
resources