[r/LocalLLaMA]score: 0.20

A First Comprehensive Study of TurboQuant: Accuracy and Performance

May 14, 2026

A benchmarking study of TurboQuant KV-cache quantization in vLLM finds FP8 (--kv-cache-dtype fp8) remains the optimal default, delivering 2x capacity gains with negligible accuracy loss. TurboQuant k8v4 offers only 2.4x savings but hurts throughput and latency. TurboQuant 4-bit non-contiguous is the most practical variant for memory-constrained deployments.

resources

SOURCE

https://vllm.ai/blog/2026-05-11-turboquant

← back to feed