[HUGGINGFACE]score: 0.97

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

May 19, 2026

DPO equivalence to RLHF is conditional on the assumption that RLHF-optimal policy prefers human-preferred responses; when violated, DPO optimizes relative advantage over reference policy rather than absolute alignment, causing pathological convergence.

paper

SOURCE

https://huggingface.co/papers/2605.20834

← back to feed