08:27 CETWednesday · May 13, 2026

shipfeed

K SEARCHJK NAVO OPEN
on the wire
home/§ agents/cluster
ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →
§ agents · cluster

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

yesterday · · primary fetch1 sourcecluster d1e27649updated yesterday ·

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere.

Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).Training: Pretrained on 200B tokens across 16 TPU v6e (27 hours) Post-trained on 2B tokens of synthesized function-calling data (45 minutes) Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)You can test it right now and finetune on your Mac/PC…

read full article on github.com
§ sources1 publication · timeline below
  1. github.comShow HN: Needle: We Distilled Gemini Tool Calling into a 26M Modelprimary
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model · shipfeed