[arXiv]score: 0.37

Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction

May 15, 2026

Collider-Bench is a new benchmark evaluating LLM agents on reproducing real LHC particle physics analyses using only public papers and open-source software, targeting long-horizon scientific tool-use. It exposes gaps between agent capabilities and genuine scientific reproducibility. A rigorous stress-test for frontier agents beyond existing coding or reasoning benchmarks.

cs.LGcs.AIhep-exhep-ph

SOURCE

https://arxiv.org/abs/2605.13950

← back to feed