Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models | Hackobar