[HUGGINGFACE]score: 0.42
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
June 8, 2026
Flow-matching policies fine-tuned via RL often suffer instability from backpropagating through denoising steps. QGF (Q-Guided Flow) sidesteps this by applying Q-function gradient guidance only at test time, leaving supervised training unchanged. The approach targets continuous robot control tasks and aims to match RL-tuned baselines without specialized training objectives.