[HN]score: 0.36
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
May 12, 2026
Cactus open-sourced Needle, a 26M parameter tool-calling model distilled from Gemini, achieving 6000 tok/s prefill and 1200 tok/s decode on consumer hardware. Targeting on-device agentic workflows, it treats function-calling as structured retrieval rather than generation, enabling budget-phone deployment. Significant for edge AI practitioners building lightweight agents without cloud inference costs.