HACKOBAR_item
[arXiv]score: 0.47

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism

May 13, 2026
DisagMoE overlaps computation and all-to-all communication in MoE training via disaggregated AF-Pipe parallelism to reduce network-bound stalls when distributing trillion-parameter expert models across GPU nodes.
cs.LGcs.AIcs.DC