[HN]score: 0.40

Espresso Bypasses CoreML for 4.76x Faster Apple Neural Engine Inference

July 5, 2026

Espresso uses reverse-engineered private APIs to compile MIL programs directly to Apple Silicon's Neural Engine. It achieves 1.08 ms/token on a 6-layer model by utilizing fused multi-layer kernels and zero-copy I/O, bypassing CoreML overhead and per-token recompilation.

HOW THIS AFFECTS YOU

●

builderYou can achieve significantly lower latency for on-device transformer inference using pure Swift.

●

researcherThe framework supports full training on ANE via forward and backward passes with gradient accumulation.

read original ↗github.com

DAILY DIGEST

you don't check 9 sources — we do. one email every morning, read in 2 min. free. unsubscribe anytime. privacy

← back to feed