[arXiv]score: 0.20

0.1–0.3% Procedural Pretraining Data Boosts Context Recall from 10% to 98%

May 29, 2026

Front-loading as little as 0.1–0.3% procedural data — generated from formal languages like Dyck sequences — before standard web-corpus pretraining significantly improves algorithmic skills in models up to 1.3B parameters. Needle-in-a-haystack accuracy jumps from 10% to 98% with Dyck sequence pretraining, with gains also seen on natural language, code, and math benchmarks.

cs.CLcs.LG

HOW THIS AFFECTS YOU

●

researcherA concrete, low-cost intervention — under 0.3% data budget — with large measured effects on context recall and reasoning; worth replicating at larger scales to assess durability of gains.

SOURCE

https://arxiv.org/abs/2601.21725

← back to feed