Multi-Rollout On-Policy Distillation via Peer Successes and Failures | Hackobar