[X]score: 0.45
CUSP Benchmark: LLMs Predict AI Benchmark Progress Well but Fail on Biology and Physics Breakthroughs
May 26, 2026
Across 4,760 scientific events, frontier LLMs can identify promising research directions but cannot reliably predict whether or when breakthroughs will occur in biology and physics, while performing notably better at forecasting AI benchmark outcomes.
HOW THIS AFFECTS YOU
●
researcherThe asymmetry—good at AI benchmark forecasting, poor at biology/physics—suggests models are pattern-matching on training distribution rather than reasoning about scientific progress, which matters for AI Scientist-type systems.
●
policyThe finding that LLM forecasting limitations cannot be explained by training data volume alone complicates arguments for scaling as a path to reliable scientific advisory AI.