[X]score: 0.25

What is speculative decoding? Speculative decoding is an inference optimization that uses a fast, small "draft" model to quickly propose several fu…

June 15, 2026

Speculative decoding uses a small draft model to propose multiple tokens in parallel, verified in a single forward pass by the target model, increasing throughput without changing output distribution. LMSYS's DFlash plus Spec V2, now default in SGLang, achieves 4.3x baseline and 1.5x native MTP throughput on Qwen 3.5 397B-A17B at concurrency 1 on 8xB200 GPUs using HumanEval benchmarks.

read original ↗x.com

← back to feed