Small Parameter Initialization Consistently Improves LLM Pretraining, Especially on Reasoning Tasks
June 17, 2026
Reducing initialization scale improves pretraining performance across LLMs, with the largest gains on reasoning-heavy tasks, and reveals a two-phase developmental trajectory where parameters first compress then expand into richer representations. Two common empirical settings are identified that suppress this benefit, and relaxing them restores favorable scaling.
HOW THIS AFFECTS YOU
●
researcherInitialization scale is a controllable, low-cost lever for improving reasoning capacity during pretraining — the identified suppressive empirical settings are worth auditing in your own training configs.