[GH]score: 0.65
jundot / omlx
May 10, 2026
omlx is a new LLM inference server built for Apple Silicon, featuring continuous batching and SSD-backed KV caching to extend effective context beyond VRAM limits, controllable via a native macOS menu bar UI. Mac-based developers running local inference who hit memory ceilings will find the SSD caching particularly practical.
python