[arXiv]score: 0.48
Uniform Scaling Limits in AdamW-Trained Transformers
May 13, 2026
Theoretical analysis proving L² convergence of AdamW-trained transformer hidden-state dynamics modeled as interacting particle systems at rate O(L⁻¹+L⁻¹/³H⁻¹/²) with depth L and heads H.
stat.MLcs.LGmath.PR