[arXiv]score: 0.06
CS Researchers Distrust LLM Leaderboards But Keep Using Them Anyway
May 29, 2026
Interviews with eight CS researchers across four subfields reveal near-universal pragmatic skepticism: deep distrust of benchmark leaderboards paired with continued use as rough decision aids. Peer networks dominate actual model selection, and arena-style human-voting leaderboards are preferred over static benchmarks.
cs.CLcs.HC
HOW THIS AFFECTS YOU
●
researcherConfirms that leaderboard performance is a weak signal for model selection in practice — peer reputation and subfield norms matter more.
●
founderIf your product relies on leaderboard positioning for credibility, this suggests that technical peer communities discount it; direct practitioner endorsements carry more weight.