[arXiv]score: 0.15

Accumulated Orthogonal Transformations Explain Length Extrapolation in Transformers

June 25, 2026

Replacing RoPE's position-indexed angles with accumulated token-dependent SO(2) rotations reproduces PaTH Attention's length extrapolation pattern, showing the effect is not Householder-specific. A proof generalizes the result to any accumulated orthogonal transformations satisfying regularity conditions: their products become incoherent after finitely many steps, creating a finite mixing window independent of context length. Degradation at extreme lengths follows from the same mechanism.

HOW THIS AFFECTS YOU

●

researcherThis theoretical unification of accumulated transformation approaches gives a principled basis for designing new positional encoding schemes targeting length extrapolation without committing to Householder-specific architectures.

read original ↗arxiv.org

← back to feed