[X]score: 0.60

Xiaomi MiMo-V2.5 Hits 1,000 Tokens/s on 1T MoE Model, Single 8-GPU Node

June 8, 2026

Xiaomi's MiMo-V2.5-Pro-UltraSpeed uses speculative decoding to reach 1,000+ tokens/s on a 1-trillion-parameter MoE model running on a single standard 8-GPU node, without wafer-scale or on-chip SRAM hardware. The UltraSpeed API is priced at 3x standard rates and is available via application through June 23.

HOW THIS AFFECTS YOU

●

builderYou can access ~10x faster inference on a frontier-scale model via API today, relevant for latency-sensitive applications like real-time agents or voice.

●

researcherThis appears to be the first production deployment of speculative decoding on a 1T-parameter MoE model, making the technical blog worth reading for implementation details.

●

founderCommodity hardware achieving Cerebras/Groq-class throughput changes the cost structure for high-throughput inference products — worth tracking as a pricing signal.

read original ↗x.com

← back to feed