[HUGGINGFACE]score: 0.71

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

May 18, 2026

Rubric-based reward aggregation in RLHF conflates criterion importance with optimization signal utility; the paper proposes policy-aware weighting to address saturation of already-optimized criteria.

paper

SOURCE

https://huggingface.co/papers/2605.20164

← back to feed