[arXiv]score: 0.24

Perturbation is All You Need for Extrapolating Language Models

May 7, 2026

Researchers from arXiv (2605.04344) propose replacing standard autoregressive next-token prediction with a perturbation-based training framework that conditions on semantically perturbed prefixes rather than exact ones, creating a hierarchical pre-post-additive noise structure. The method delivers measurable out-of-support generalization gains while preserving in-distribution performance, backed by formal extrapolability theory. LLM pretraining teams and researchers tackling distribution shift should take note, as this challenges the foundational assumption that exact-prefix conditioning is optimal, offering a theoretically grounded alternative to standard cross-entropy training with no architectural overhead.

stat.MLcs.LGmath.STstat.TH

SOURCE

https://arxiv.org/abs/2605.04344

← back to feed