HACKOBAR_item
[r/LocalLLaMA]score: 0.17

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

May 8, 2026
A community benchmark shows Gemma-4 26B (AWQ 4-bit, MoE A4B architecture) hitting 578 tok/s on a single RTX 5090 using vLLM 0.19.2rc1 with DFlash speculative decoding at 13 draft tokens, up from 228 tok/s baseline — a 2.5x throughput gain. Single-GPU deployment of capable MoE models just got significantly more practical.
discussion