[arXiv]score: 0.37
Unsteady Metrics and Benchmarking Cultures of AI Model Builders
May 15, 2026
Analysis of benchmark selection practices in AI model builder press releases and blog posts, introducing Benchmarking-Cultures-25 dataset to examine how companies choose which benchmarks to highlight and what this reveals about state-of-the-art claims.
cs.AI