[arXiv]score: 0.10

Prompt-Level Rubrics and Constraint Checkers Built Offline for RLHF

May 29, 2026

The framework generates task-specific rubrics and executable hard-constraint checkers from prompts alone before training, combining artifact-anchored rubric scores with a global holistic score into a normalized hybrid reward. This separates reward specification from reward computation, making criteria explicit and reusable across rollouts for instruction following and writing tasks.

cs.CL

HOW THIS AFFECTS YOU

●

builderOffline rubric construction means reward criteria can be audited and reused without re-querying a judge model on every rollout, reducing post-training compute.

●

researcherDecomposing reward into pre-specified rubrics plus residual holistic scoring is a concrete alternative to scalar RLHF for open-ended tasks.

SOURCE

https://arxiv.org/abs/2605.29275

← back to feed