[arXiv]score: 0.24

Designing a double deep reinforcement learning selection tool for resilient demand prediction

May 7, 2026

Double Deep Q-Network (DDQN) applied to automated forecasting model selection in supply chains, dynamically choosing from a forecasting committee at inference time rather than static model selection. The architecture introduces reward-convergence-based early stopping to reduce training overhead. Validated on grocery and snack demand datasets. ML engineers building demand forecasting pipelines should evaluate this as a meta-learning alternative to ensemble averaging or AutoML search, particularly where dataset heterogeneity makes fixed model choices brittle.

cs.LGcs.AI

SOURCE

https://arxiv.org/abs/2605.04068

← back to feed