HACKOBAR_item
[arXiv]score: 0.44

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

May 13, 2026
Comprehensive empirical study identifies when on-policy distillation (OPD) and on-policy self-distillation (OPSD) succeed or fail for LLM post-training, revealing instability mechanisms and proposing fixes for dense token-level supervision methods.
cs.AI