[arXiv]score: 0.14

Pass@k Misclassifies Up to 22.9% of Math Problems as Unsolvable

June 19, 2026

On GSM8K and MATH across four open-weight models, 10.3–22.9% of problems that fail all six sampled chains are actually solvable using deterministic decoding with activation grafting perturbations at matched compute. This challenges pass@k as a reliable difficulty signal for RL curricula, data curation, and verifier training.

HOW THIS AFFECTS YOU

●

researcherThis directly undermines pass@k as a ground-truth difficulty proxy — worth revisiting any RL or curriculum pipeline that uses zero-pass-rate as a hard filter.

read original ↗arxiv.org

← back to feed