●builderYou can use RLHF and on-policy distillation to refine diffusion models for specific tasks like portrait generation or image editing.
●researcherThis demonstrates the effectiveness of GRPO-based RL for scaling visual reward models in diffusion pipelines.