[arXiv]score: 0.19
CriticalKV Cuts KV Cache via Perturbation-Constrained Entry Selection
May 29, 2026
CriticalKV formalizes KV cache pruning by analyzing attention output perturbation, showing that value states and pretrained parameter matrices matter beyond attention weights alone. The resulting plug-and-play selection algorithm adds negligible overhead and improves long-sequence inference efficiency.
cs.CL
HOW THIS AFFECTS YOU
●
builderYou can drop CriticalKV into existing inference stacks as a plug-and-play module to reduce KV cache memory costs on long-context workloads.
●
researcherThe perturbation-constrained formalization provides theoretical grounding for KV cache eviction that prior attention-weight heuristics lacked.