●builderIf you're fine-tuning smaller models using GPT-4 outputs, this approach may improve transfer quality without requiring any changes to the teacher API access.
●researcherProxy-KD offers a structured alternative to naive output imitation when distilling from closed APIs, worth examining for architecture and benchmark comparisons.