[arXiv]score: 0.20
0.1–0.3% Procedural Pretraining Data Boosts Context Recall from 10% to 98%
May 29, 2026
Front-loading as little as 0.1–0.3% procedural data — generated from formal languages like Dyck sequences — before standard web-corpus pretraining significantly improves algorithmic skills in models up to 1.3B parameters. Needle-in-a-haystack accuracy jumps from 10% to 98% with Dyck sequence pretraining, with gains also seen on natural language, code, and math benchmarks.
cs.CLcs.LG
HOW THIS AFFECTS YOU
●
researcherA concrete, low-cost intervention — under 0.3% data budget — with large measured effects on context recall and reasoning; worth replicating at larger scales to assess durability of gains.