[HN]score: 0.27

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

June 1, 2026

Running a 26B-parameter Gemma 4 MTP drafter with speculative decoding on a 2016 Xeon E5-2620 v4 with 128 GB DDR3 and no GPU requires bypassing ollama and standard llama.cpp in favor of lower-level inference tooling that exposes memory bandwidth optimizations. DDR3 bandwidth is roughly 5-6x slower than current laptop RAM, making the memory wall the dominant constraint. The post details how custom quantization and MTP drafting configuration can make this hardware viable where off-the-shelf tools cannot.

SOURCE

https://point.free/blog/gemma-4-on-a-2016-xeon/

← back to feed