[arXiv]score: 0.47
CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
May 13, 2026
CATS introduces cascaded adaptive tree speculation for memory-limited LLM inference, optimizing speculative decoding under HBM constraints to accelerate auto-regressive decoding beyond memory bandwidth bottlenecks.
cs.LGcs.AI