●builderYou can enable faster local inference today by pulling the latest llama.cpp and using EAGLE3 speculative decoding, with no external serving infrastructure required.
●researcherEAGLE3's guided-draft architecture offers a concrete alternative to MTP for studying speculative decoding acceptance rate improvements.