[arXiv]score: 0.24
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
May 7, 2026
RetentiveKV (arXiv:2605.04075) introduces an entropy-driven KV cache compression framework for multimodal LLMs that replaces discrete token pruning with continuous State Space Model-based memory evolution. The method addresses two critical failure modes in existing approaches: deferred visual token importance and spatial discontinuity from hard eviction. By quantifying information potential via entropy, RetentiveKV avoids premature eviction of visually salient tokens that appear low-salience early in decoding. Engineers deploying vision-language models on long-context visual tasks should prioritize this over attention-score-based pruning methods like H2O or SnapKV, which assume static importance persistence.
cs.LGcs.AIcs.CL