[arXiv]score: 0.47

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

May 13, 2026

Variational Linear Attention (VLA) reframes linear attention memory updates as online regularized least-squares with adaptive penalty via Sherman-Morrison formula, achieving unit spectral norm Jacobian to prevent progressive interference in long-context transformers.

cs.LG

SOURCE

https://arxiv.org/abs/2605.11196

← back to feed