[r/LocalLLaMA]score: 0.17

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

May 8, 2026

A community benchmark shows Gemma-4 26B (AWQ 4-bit, MoE A4B architecture) hitting 578 tok/s on a single RTX 5090 using vLLM 0.19.2rc1 with DFlash speculative decoding at 13 draft tokens, up from 228 tok/s baseline — a 2.5x throughput gain. Single-GPU deployment of capable MoE models just got significantly more practical.

discussion

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1t796qe/gemma_4_26b_hits_600_toks_on_one_rtx_5090/

← back to feed