[arXiv]score: 0.24
Continual Distillation of Teachers from Different Domains
May 7, 2026
Researchers introduce Continual Distillation (CD), a framework enabling a student model to sequentially absorb knowledge from heterogeneous teacher models without retaining prior teacher access or their training data. SE2D mitigates Unseen Knowledge Forgetting by preserving logits on external unlabeled data across teacher transitions. This directly addresses catastrophic forgetting in multi-domain distillation pipelines, outperforming naive sequential distillation baselines. ML engineers deploying compressed models across evolving domain-specific teacher ensembles should prioritize this method for storage-constrained, continual learning production systems.
cs.LGcs.CV