[arXiv]score: 0.41
The Evaluation Trap: Benchmark Design as Theoretical Commitment
May 15, 2026
Theoretical critique arguing that benchmark design operationalizes unexamined assumptions that narrow capability definitions over time, creating an evaluation trap where metrics cease tracking independent capabilities and instead produce benchmark-defined versions of targets.
cs.AIcs.CY