[GH]score: 0.65

jundot / omlx

May 10, 2026

omlx is a new LLM inference server built for Apple Silicon, featuring continuous batching and SSD-backed KV caching to extend effective context beyond VRAM limits, controllable via a native macOS menu bar UI. Mac-based developers running local inference who hit memory ceilings will find the SSD caching particularly practical.

python

SOURCE

https://github.com/jundot/omlx

← back to feed