[arXiv]score: 0.47
Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
May 13, 2026
Variational Linear Attention (VLA) reframes linear attention memory updates as online regularized least-squares with adaptive penalty via Sherman-Morrison formula, achieving unit spectral norm Jacobian to prevent progressive interference in long-context transformers.
cs.LG