[arXiv]score: 0.37

Unsteady Metrics and Benchmarking Cultures of AI Model Builders

May 15, 2026

Analysis of benchmark selection practices in AI model builder press releases and blog posts, introducing Benchmarking-Cultures-25 dataset to examine how companies choose which benchmarks to highlight and what this reveals about state-of-the-art claims.

cs.AI

SOURCE

https://arxiv.org/abs/2605.14164

← back to feed