[HUGGINGFACE]score: 0.42

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

June 8, 2026

Flow-matching policies fine-tuned via RL often suffer instability from backpropagating through denoising steps. QGF (Q-Guided Flow) sidesteps this by applying Q-function gradient guidance only at test time, leaving supervised training unchanged. The approach targets continuous robot control tasks and aims to match RL-tuned baselines without specialized training objectives.

read original ↗huggingface.co

← back to feed