[arXiv]score: 0.44

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

May 13, 2026

Comprehensive empirical study identifies when on-policy distillation (OPD) and on-policy self-distillation (OPSD) succeed or fail for LLM post-training, revealing instability mechanisms and proposing fixes for dense token-level supervision methods.

cs.AI

SOURCE

https://arxiv.org/abs/2605.11182

← back to feed