[arXiv]score: 0.37
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
May 15, 2026
Collider-Bench is a new benchmark evaluating LLM agents on reproducing real LHC particle physics analyses using only public papers and open-source software, targeting long-horizon scientific tool-use. It exposes gaps between agent capabilities and genuine scientific reproducibility. A rigorous stress-test for frontier agents beyond existing coding or reasoning benchmarks.
cs.LGcs.AIhep-exhep-ph