[r/LocalLLaMA]score: 0.16

LFM2.5 230M Runs at 1,400 tok/s In-Browser via Custom WebGPU Kernels

June 25, 2026

LiquidAI's LFM2.5-230M runs fully client-side at 1,400 tokens per second on an M4 Max using custom WebGPU kernels, with a live demo on Hugging Face Spaces and GGUF weights available. This is a meaningful throughput benchmark for on-device inference without any server-side compute.

HOW THIS AFFECTS YOU

●

builderYou can ship fully local LLM features in web apps today using these WebGPU kernels and the GGUF weights, with no API costs or latency from network calls.

●

designerReal-time in-browser inference at this speed opens up low-latency generative UI patterns that were previously only feasible with server-side calls.

read original ↗v.redd.it

← back to feed