[arXiv]score: 0.86
Sleep-Like KV Cache Consolidation into Fast Weights Improves Long-Context Transformer Performance
May 26, 2026
Periodically converting accumulated KV cache into persistent fast weights via offline recurrent passes over SSM blocks enables transformers to handle long-horizon tasks — including multi-hop graph retrieval and math reasoning — where standard transformers and SSM-attention hybrids fail, with performance scaling with sleep duration N.
cs.CLcs.AI
HOW THIS AFFECTS YOU
●
builderThis architecture could reduce KV cache memory costs for long-context inference while improving performance on multi-hop reasoning tasks, though it requires SSM blocks in the model.
●
researcherThe sleep-wake consolidation mechanism provides a biologically-inspired, empirically validated approach to long-context memory that outperforms standard attention and hybrid models on structured reasoning tasks.