[HN]score: 0.07
Stochastic Rounding Matches FP32 Optimizer State at 6 Bytes vs 10
May 29, 2026
Stochastic rounding eliminates the systematic bias in BF16 training where round-to-nearest causes identical errors that compound as O(n) versus O(sqrt(n)) for zero-mean stochastic errors. A small MLP experiment using AdamW shows BF16 parameters plus stochastic-rounded optimizer states at 6 bytes per parameter matches FP32 optimizer state quality at 10 bytes.
HOW THIS AFFECTS YOU
●
builderYou can potentially cut optimizer state memory from 10 to 6 bytes per parameter without accuracy loss by switching to stochastic rounding in BF16 training runs.
●
researcherThe O(n) vs O(sqrt(n) error growth framing gives a clean theoretical justification for stochastic rounding in low-precision training, relevant to anyone studying quantization or mixed-precision optimization.