[arXiv]score: 0.13

Plasticity Loss Follows Sublinear Scaling Law in Transformers Up to 314M Params

June 24, 2026

GPT-style transformers from 5M to 314M non-embedding parameters all show plasticity loss in a multilingual continual learning setup, measured via degradation on a held-out Vietnamese probing task. The onset scales sublinearly with model size, meaning larger models delay but do not eliminate the problem. Scale alone is not a solution for continual learning in LLMs.

HOW THIS AFFECTS YOU

●

researcherThe sublinear scaling law for plasticity loss onset gives you a concrete predictive handle for continual learning experiments, and the multilingual probing setup is a reusable evaluation protocol.

read original ↗arxiv.org

← back to feed