HACKOBAR_item
[arXiv]score: 0.47

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

May 13, 2026
CATS introduces cascaded adaptive tree speculation for memory-limited LLM inference, optimizing speculative decoding under HBM constraints to accelerate auto-regressive decoding beyond memory bandwidth bottlenecks.
cs.LGcs.AI