[HN]score: 0.27
A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)
June 1, 2026
Running a 26B-parameter Gemma 4 MTP drafter with speculative decoding on a 2016 Xeon E5-2620 v4 with 128 GB DDR3 and no GPU requires bypassing ollama and standard llama.cpp in favor of lower-level inference tooling that exposes memory bandwidth optimizations. DDR3 bandwidth is roughly 5-6x slower than current laptop RAM, making the memory wall the dominant constraint. The post details how custom quantization and MTP drafting configuration can make this hardware viable where off-the-shelf tools cannot.