HACKOBAR_item
[arXiv]score: 0.41

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

May 13, 2026
Trajectory Matching Policy Optimization (TMPO) constrains diffusion model RL alignment to probability distributions over acceptable trajectories, preventing reward hacking-induced mode collapse and visual degradation.
cs.LGcs.AIcs.CV