[arXiv]score: 0.41
Optimistic Dual Averaging Unifies Modern Optimizers
May 13, 2026
SODA unifies modern optimizers including Muon, Lion, AdEMAMix, and NAdam under a generalized Optimistic Dual Averaging framework, and introduces a theoretically-grounded 1/k weight decay schedule eliminating manual tuning. Empirical results show consistent gains across scales with no additional hyperparameters. Optimization researchers and large-scale training practitioners should take note.
cs.LG