[HUGGINGFACE]score: 0.91

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

May 15, 2026

A method converts full-attention LLMs into sparse-attention models in under 100 training steps by identifying that only a subset of heads require full long-context processing. The approach avoids native sparse training or heuristic token eviction, targeting the quadratic attention bottleneck for long-context inference. Practitioners deploying long-context models could reduce inference cost with minimal retraining overhead compared to from-scratch sparse architectures.

paper

SOURCE

https://huggingface.co/papers/2605.16928

← back to feed