Static Fibonacci Spacing Outperforms Learned Dilation in Sparse Attention
June 30, 2026
Fibonacci-spaced offsets with static per-layer staggering outperform learned alpha schedules and power-of-2 controls in sparse self-attention. Testing across 21 models shows that static schedules achieve better perplexity without the five-fold inference latency penalty associated with learned dilation parameters.
HOW THIS AFFECTS YOU
●
builderYou can reduce inference latency by using static Fibonacci spacing instead of learned dilation methods.
●
researcherYou can replace complex learned dilation with simpler static staggered schedules for better perplexity.