[arXiv]score: 0.86

Sleep-Like KV Cache Consolidation into Fast Weights Improves Long-Context Transformer Performance

May 26, 2026

Periodically converting accumulated KV cache into persistent fast weights via offline recurrent passes over SSM blocks enables transformers to handle long-horizon tasks — including multi-hop graph retrieval and math reasoning — where standard transformers and SSM-attention hybrids fail, with performance scaling with sleep duration N.

cs.CLcs.AI

HOW THIS AFFECTS YOU

●

builderThis architecture could reduce KV cache memory costs for long-context inference while improving performance on multi-hop reasoning tasks, though it requires SSM blocks in the model.

●

researcherThe sleep-wake consolidation mechanism provides a biologically-inspired, empirically validated approach to long-context memory that outperforms standard attention and hybrid models on structured reasoning tasks.

SOURCE

https://arxiv.org/abs/2605.26099

← back to feed