[arXiv]score: 0.24

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

May 7, 2026

RetentiveKV (arXiv:2605.04075) introduces an entropy-driven KV cache compression framework for multimodal LLMs that replaces discrete token pruning with continuous State Space Model-based memory evolution. The method addresses two critical failure modes in existing approaches: deferred visual token importance and spatial discontinuity from hard eviction. By quantifying information potential via entropy, RetentiveKV avoids premature eviction of visually salient tokens that appear low-salience early in decoding. Engineers deploying vision-language models on long-context visual tasks should prioritize this over attention-score-based pruning methods like H2O or SnapKV, which assume static importance persistence.

cs.LGcs.AIcs.CL

SOURCE

https://arxiv.org/abs/2605.04075

← back to feed