[arXiv]score: 0.10
Prompt-Level Rubrics and Constraint Checkers Built Offline for RLHF
May 29, 2026
The framework generates task-specific rubrics and executable hard-constraint checkers from prompts alone before training, combining artifact-anchored rubric scores with a global holistic score into a normalized hybrid reward. This separates reward specification from reward computation, making criteria explicit and reusable across rollouts for instruction following and writing tasks.
cs.CL
HOW THIS AFFECTS YOU
●
builderOffline rubric construction means reward criteria can be audited and reused without re-querying a judge model on every rollout, reducing post-training compute.
●
researcherDecomposing reward into pre-specified rubrics plus residual holistic scoring is a concrete alternative to scalar RLHF for open-ended tasks.