[arXiv]score: 0.13

Scaling Laws for Task-Specific LLM Distillation

June 24, 2026

Empirical scaling laws for compressing LLMs via iterative structural pruning show that general-knowledge benchmarks collapse at lower compression ratios than in-domain task quality, with supervision format being the primary driver of that gap. Tested in quantitative finance, a blended chain-of-thought KL-divergence loss over reasoning traces recovers general knowledge that standard logit-based or LoRA distillation loses. The laws quantify degradation curves across dataset size, compression ratio, and pruning schedule.

read original ↗arxiv.org

← back to feed