[r/LocalLLaMA]score: 0.18

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

May 22, 2026

BeeLlama v0.2.0 delivers 4.4x to 4.93x token generation speedups for Qwen 3.6 27B and Gemma 4 31B on a single RTX 3090 via an updated DFlash speculative decoding implementation with K/V projection caching and improved CUDA execution. Practitioners running large models on consumer hardware gain near-4x throughput without additional GPUs, though prompt processing speed remains at baseline.

resources

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/

← back to feed