HACKOBAR_item
[arXiv]score: 0.48

Uniform Scaling Limits in AdamW-Trained Transformers

May 13, 2026
Theoretical analysis proving L² convergence of AdamW-trained transformer hidden-state dynamics modeled as interacting particle systems at rate O(L⁻¹+L⁻¹/³H⁻¹/²) with depth L and heads H.
stat.MLcs.LGmath.PR