[HUGGINGFACE]score: 0.71
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
May 18, 2026
Rubric-based reward aggregation in RLHF conflates criterion importance with optimization signal utility; the paper proposes policy-aware weighting to address saturation of already-optimized criteria.
paper