On-Policy Distillation Explained: Key Post-Training Behind Qwen 3, DeepSeek-V4, GLM-5.1 | HACKOBAR_