[r/LocalLLaMA]score: 0.18
BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
May 22, 2026
BeeLlama v0.2.0 delivers 4.4x to 4.93x token generation speedups for Qwen 3.6 27B and Gemma 4 31B on a single RTX 3090 via an updated DFlash speculative decoding implementation with K/V projection caching and improved CUDA execution. Practitioners running large models on consumer hardware gain near-4x throughput without additional GPUs, though prompt processing speed remains at baseline.
resources