NormGuard Prevents Quality Degradation in Flow-Matching RL
June 25, 2026
Post-training RL for flow-based generators inflates per-step velocity norms by 5% to 15%, degrading perceptual quality. NormGuard addresses this by implementing reward-preserving norm constraints to mitigate the drift observed in methods like DPO and AWM.
HOW THIS AFFECTS YOU
●
researcherYou can maintain higher perceptual quality in flow-matching models during RL fine-tuning by controlling velocity norm inflation.