[arXiv]score: 0.67
Tensor Cache uses evicted KV pairs as fast-weight L2 memory to extend transformer context
May 25, 2026
A two-level cache architecture pairs sliding-window attention with a fixed-size outer-product fast-weight matrix fed by evicted KV pairs, allowing transformers to access compressed representations of tokens outside the attention window via a single matrix multiply.
cs.LGcs.AI