●builderYou can apply MixSD during fine-tuning to reduce capability regression on reasoning benchmarks when injecting domain-specific facts.
●researcherWorth watching because the external-teacher-free distillation framing offers a principled alternative to standard SFT loss for knowledge injection without auxiliary models.