[r/LocalLLaMA]score: 0.23

EAGLE3 Speculative Decoding Merged into llama.cpp

June 12, 2026

EAGLE3, a speculative decoding method where the draft model receives hidden-state guidance from the main model rather than operating independently, is now available in llama.cpp after six months of development. This differs from MTP-style approaches by giving the helper model richer context, typically improving draft acceptance rates and thus inference throughput.

HOW THIS AFFECTS YOU

●

builderYou can enable faster local inference today by pulling the latest llama.cpp and using EAGLE3 speculative decoding, with no external serving infrastructure required.

●

researcherEAGLE3's guided-draft architecture offers a concrete alternative to MTP for studying speculative decoding acceptance rate improvements.

read original ↗github.com

← back to feed