Accumulated Orthogonal Transformations Explain Length Extrapolation in Transformers
June 25, 2026
Replacing RoPE's position-indexed angles with accumulated token-dependent SO(2) rotations reproduces PaTH Attention's length extrapolation pattern, showing the effect is not Householder-specific. A proof generalizes the result to any accumulated orthogonal transformations satisfying regularity conditions: their products become incoherent after finitely many steps, creating a finite mixing window independent of context length. Degradation at extreme lengths follows from the same mechanism.
HOW THIS AFFECTS YOU
●
researcherThis theoretical unification of accumulated transformation approaches gives a principled basis for designing new positional encoding schemes targeting length extrapolation without committing to Householder-specific architectures.