[arXiv]score: 0.47

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

May 13, 2026

CATS introduces cascaded adaptive tree speculation for memory-limited LLM inference, optimizing speculative decoding under HBM constraints to accelerate auto-regressive decoding beyond memory bandwidth bottlenecks.

cs.LGcs.AI

SOURCE

https://arxiv.org/abs/2605.11186

← back to feed