[arXiv]score: 0.19
SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding
April 30, 2026
SpecTr-GBV unifies multi-draft speculative decoding with greedy block verification into a single optimal transport framework, proving theoretically optimal expected acceptance length under i.i.d. assumptions. Prior methods like SpecTr and GBV treated these strategies independently, leaving joint efficiency gains unrealized. Inference engineers targeting LLM serving latency reduction should evaluate this for throughput-critical pipelines where draft-target model pairs are already deployed.
cs.CL