[arXiv]score: 0.51
Stackelberg Framework Explains Why Layer-Specific Learning Rates Speed Training
May 26, 2026
Using smaller learning rates for body layers and larger for the final layer can be formalized as two-time-scale alternating gradient descent on a Stackelberg game reformulation, with finite-time convergence guarantees under non-smooth activations and constraints.
cs.LG
HOW THIS AFFECTS YOU
●
researcherProvides theoretical grounding for an empirically observed training heuristic, with convergence proofs that may inform principled learning rate schedule design.