[r/LocalLLaMA]score: 0.22

z-lab released gemma-4-26B-A4B-it-DFlash. Anybody tried it yet?

May 8, 2026

Z-lab released DFlash for Gemma-4 26B-A4B, a stateful speculative decoding alternative to Multi-Token Prediction. DFlash uses parallel block diffusion drafting with persistent KV cache, RoPE offsets, and context buffers across iterations, theoretically maintaining speed gains as context grows where MTP degrades. Worth benchmarking for long-context inference workloads.

discussion

SOURCE

https://huggingface.co/z-lab/gemma-4-26B-A4B-it-DFlash

← back to feed