HACKOBAR_item
[arXiv]score: 0.24

Designing a double deep reinforcement learning selection tool for resilient demand prediction

May 7, 2026
Double Deep Q-Network (DDQN) applied to automated forecasting model selection in supply chains, dynamically choosing from a forecasting committee at inference time rather than static model selection. The architecture introduces reward-convergence-based early stopping to reduce training overhead. Validated on grocery and snack demand datasets. ML engineers building demand forecasting pipelines should evaluate this as a meta-learning alternative to ensemble averaging or AutoML search, particularly where dataset heterogeneity makes fixed model choices brittle.
cs.LGcs.AI