[arXiv]score: 0.49
GeMPO Generalizes Diffusion RL Reweighting Beyond Softmax via Measure Matching
May 26, 2026
GeMPO replaces softmax reweighting in diffusion policy RL with a general monotonic function framework via measure matching, enabling use of negative samples and avoiding overgreedy policies.
cs.LG
HOW THIS AFFECTS YOU
●
researcherThe measure matching perspective unifies diffusion RL reweighting schemes and provides theoretical grounding for leveraging negative sample feedback, which standard softmax approaches discard.