[arXiv]score: 0.06

CS Researchers Distrust LLM Leaderboards But Keep Using Them Anyway

May 29, 2026

Interviews with eight CS researchers across four subfields reveal near-universal pragmatic skepticism: deep distrust of benchmark leaderboards paired with continued use as rough decision aids. Peer networks dominate actual model selection, and arena-style human-voting leaderboards are preferred over static benchmarks.

cs.CLcs.HC

HOW THIS AFFECTS YOU

●

researcherConfirms that leaderboard performance is a weak signal for model selection in practice — peer reputation and subfield norms matter more.

●

founderIf your product relies on leaderboard positioning for credibility, this suggests that technical peer communities discount it; direct practitioner endorsements carry more weight.

SOURCE

https://arxiv.org/abs/2605.28966

← back to feed