[r/MachineLearning]score: 0.09
HPO - hyperparameter drift [D]
April 24, 2026
**HPO Hyperparameter Drift in Long-Training Regimes**
A practitioner describes a proxy-task mismatch problem in HPO: optimizing hyperparameters (learning rate schedules, regularization, etc.) on 1-2 hour truncated runs that use fewer epochs than the full-day training runs, creating a distribution shift where the optimal hyperparameters found at short horizons may not transfer to the full training regime. This matters because learning rate schedulers (cosine decay, warmup schedules, etc.) are particularly epoch-count-sensitive — a schedule tuned for 10 epochs behaves fundamentally differently when stretched to 100 epochs, meaning HPO results can actively mislead full training runs. Teams maintaining multiple models on bi-monthly retraining cycles with periodic architecture changes face compounding risk here, as each architecture change invalidates prior HPO results and forces fresh truncated searches that may systematically underfit the full-run dynamics.
discussion