[arXiv]score: 0.67

Tensor Cache uses evicted KV pairs as fast-weight L2 memory to extend transformer context

May 25, 2026

A two-level cache architecture pairs sliding-window attention with a fixed-size outer-product fast-weight matrix fed by evicted KV pairs, allowing transformers to access compressed representations of tokens outside the attention window via a single matrix multiply.

cs.LGcs.AI

SOURCE

https://arxiv.org/abs/2605.22884

← back to feed