[X]score: 0.34

Gemma 4 Hits 255 tok/s on WebGPU via Agentic Kernel Writing

June 17, 2026

Custom WebGPU kernels reportedly pushed Gemma 4 inference to 255 tok/s in-browser, up from 84 tok/s, using an agentic optimization loop. The demo and kernels are claimed to be released publicly. The surrounding narrative about safeguard rollbacks is satirical, but the kernel optimization approach and performance numbers are the substantive claim.

HOW THIS AFFECTS YOU

●

builderIf the kernel code is real and reproducible, this sets a new bar for browser-side LLM inference throughput worth benchmarking against your own WebGPU pipelines.

●

researcherAgentic iterative kernel optimization as a method for on-device inference tuning is worth evaluating, though the provenance of these results needs independent verification.

read original ↗x.com

← back to feed