[arXiv]score: 0.07

Frontier LLMs Cannot Independently Find Errors in Economic Theory Papers

June 5, 2026

Across four published economics papers with known errors, no model — including ChatGPT Pro, Claude, and Gemini — located a true error without substantial human guidance. A human-AI pair outperformed solo AI and likely current peer review, but data contamination limits clean interpretation.

HOW THIS AFFECTS YOU

●

researcherCalibrates expectations for AI-assisted formal verification: current models need expert scaffolding to catch subtle theoretical errors.

●

policyRelevant to ongoing debates about AI's role in peer review and scientific validation pipelines.

read original ↗arxiv.org

← back to feed