Plasticity Loss Follows Sublinear Scaling Law in Transformers Up to 314M Params
June 24, 2026
GPT-style transformers from 5M to 314M non-embedding parameters all show plasticity loss in a multilingual continual learning setup, measured via degradation on a held-out Vietnamese probing task. The onset scales sublinearly with model size, meaning larger models delay but do not eliminate the problem. Scale alone is not a solution for continual learning in LLMs.
HOW THIS AFFECTS YOU
●
researcherThe sublinear scaling law for plasticity loss onset gives you a concrete predictive handle for continual learning experiments, and the multilingual probing setup is a reusable evaluation protocol.