[arXiv]score: 0.41
Multi-Rollout On-Policy Distillation via Peer Successes and Failures
May 14, 2026
Introduces Multi-Rollout On-Policy Distillation (MOPD), a peer-conditioned framework that leverages multiple sampled trajectories per prompt to provide denser token-level supervision for LLM post-training.
cs.LGcs.AI