[HUGGINGFACE]score: 0.91
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
May 15, 2026
A method converts full-attention LLMs into sparse-attention models in under 100 training steps by identifying that only a subset of heads require full long-context processing. The approach avoids native sparse training or heuristic token eviction, targeting the quadratic attention bottleneck for long-context inference. Practitioners deploying long-context models could reduce inference cost with minimal retraining overhead compared to from-scratch sparse architectures.
paper