Pass@k Misclassifies Up to 22.9% of Math Problems as Unsolvable
June 19, 2026
On GSM8K and MATH across four open-weight models, 10.3–22.9% of problems that fail all six sampled chains are actually solvable using deterministic decoding with activation grafting perturbations at matched compute. This challenges pass@k as a reliable difficulty signal for RL curricula, data curation, and verifier training.
HOW THIS AFFECTS YOU
●
researcherThis directly undermines pass@k as a ground-truth difficulty proxy — worth revisiting any RL or curriculum pipeline that uses zero-pass-rate as a hard filter.