[arXiv]score: 0.68
Single Dimensionless Parameter E≥0.5 Guarantees Zero Dead Experts in MoE Models
May 26, 2026
A dimensionless control parameter E = T*H/(O+B) combining four MoE routing hyperparameters predicts expert collapse, with E≥0.5 empirically sufficient to eliminate dead experts across 11,000+ training epochs on vision and language benchmarks.
cs.LGcs.AIcs.CLcs.CV
HOW THIS AFFECTS YOU
●
builderYou can use E≥0.5 as a practical hyperparameter constraint when training MoE models to avoid dead expert collapse without hand-tuning balance losses.
●
researcherThis cross-modal finding reduces MoE load-balancing to a single interpretable scalar, potentially replacing ad-hoc auxiliary loss tuning with a principled design rule.