[HUGGINGFACE]score: 0.43

NormGuard Prevents Quality Degradation in Flow-Matching RL

June 25, 2026

Post-training RL for flow-based generators inflates per-step velocity norms by 5% to 15%, degrading perceptual quality. NormGuard addresses this by implementing reward-preserving norm constraints to mitigate the drift observed in methods like DPO and AWM.

HOW THIS AFFECTS YOU

●

researcherYou can maintain higher perceptual quality in flow-matching models during RL fine-tuning by controlling velocity norm inflation.

read original ↗huggingface.co

← back to feed