[arXiv]score: 0.43
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
May 15, 2026
Identifies trajectory locking failure mode in reward-maximizing diffusion language model post-training and proposes TraFL (Trajectory Flow baLancing), a trajectory-balance objective maintaining coverage of alternative correct solutions.
cs.LGcs.CL