Frontier LLMs Cannot Independently Find Errors in Economic Theory Papers
June 5, 2026
Across four published economics papers with known errors, no model — including ChatGPT Pro, Claude, and Gemini — located a true error without substantial human guidance. A human-AI pair outperformed solo AI and likely current peer review, but data contamination limits clean interpretation.
HOW THIS AFFECTS YOU
●
researcherCalibrates expectations for AI-assisted formal verification: current models need expert scaffolding to catch subtle theoretical errors.
●
policyRelevant to ongoing debates about AI's role in peer review and scientific validation pipelines.