[GH]score: 0.59
Cactus Needle: 26M Parameter Efficient Model
May 13, 2026
Cactus Needle is a 26M parameter Simple Attention Network distilled from Gemini 3.1, achieving 6,000 tokens/sec prefill and 1,200 tokens/sec decode on consumer hardware with open weights. Its extreme efficiency makes it compelling for edge deployment, though the 26M scale limits complex reasoning tasks.