[HUGGINGFACE]score: 0.97
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
May 19, 2026
DPO equivalence to RLHF is conditional on the assumption that RLHF-optimal policy prefers human-preferred responses; when violated, DPO optimizes relative advantage over reference policy rather than absolute alignment, causing pathological convergence.
paper