[HN]score: 0.07

Stochastic Rounding Matches FP32 Optimizer State at 6 Bytes vs 10

May 29, 2026

Stochastic rounding eliminates the systematic bias in BF16 training where round-to-nearest causes identical errors that compound as O(n) versus O(sqrt(n)) for zero-mean stochastic errors. A small MLP experiment using AdamW shows BF16 parameters plus stochastic-rounded optimizer states at 6 bytes per parameter matches FP32 optimizer state quality at 10 bytes.

HOW THIS AFFECTS YOU

●

builderYou can potentially cut optimizer state memory from 10 to 6 bytes per parameter without accuracy loss by switching to stochastic rounding in BF16 training runs.

●

researcherThe O(n) vs O(sqrt(n) error growth framing gives a clean theoretical justification for stochastic rounding in low-precision training, relevant to anyone studying quantization or mixed-precision optimization.

SOURCE

https://convergentthinking.sh/posts/bias-compounds-variance-washes-out/

← back to feed