[HUGGINGFACE]score: 0.48

RAT+ Exponential Memory Boosts Sparse KV-Cache Methods on Long-Context Tasks

May 26, 2026

Adding exponentially decaying recurrence (RAT+) to sparse attention methods Quest, MoBA, and SnapKV consistently improves accuracy on eight needle-in-a-haystack tasks across sparse KV budgets. Gains validated on both RAT+ checkpoints and OLMo2-7B continued-pretrained for 10B tokens with the memory module added.

HOW THIS AFFECTS YOU

●

builderIf you're using Quest or SnapKV for KV-cache compression in production, RAT+ pretraining may be worth the cost for accuracy recovery at aggressive sparsity budgets.

●

researcherDemonstrates that recurrence-augmented attention is complementary to query-aware sparsity rather than a replacement — useful for long-context architecture design decisions.

read original ↗huggingface.co

← back to feed