GPT-2-Scale Models Cannot Discover Zero Without Additional Training Data
June 17, 2026
GPT-2-sized language models fail to independently generalize to the concept of zero at test time regardless of language pretraining, but improve substantially after exposure to tens of additional training examples. The study uses arithmetic as a proxy for evaluating out-of-distribution mathematical discovery in neural networks.
HOW THIS AFFECTS YOU
●
researcherProvides a concrete, reproducible case study on the limits of compositional generalization in LLMs for mathematical reasoning, relevant to evaluating frontier model claims about novel discovery.