Self-Distillation Plus RL Elicits Task-Solving in Video Diffusion Models
June 9, 2026
A framework uses a VLM to generate candidate tasks and step-by-step solutions, conditions a pretrained video diffusion model (Demonstrator) on those solutions, then distills the behavior into a leaner model via RL — requiring no paired task-execution videos. This sidesteps the costly supervised fine-tuning data collection bottleneck for world model planning.
HOW THIS AFFECTS YOU
●
researcherThe self-distillation plus RL loop offers a scalable alternative to supervised fine-tuning for eliciting planning behavior from video generators without labeled execution data.