HACKOBAR_item
[r/LocalLLaMA]score: 0.16

The GB10 Solution Atlas is now open source, the inference engine made for the community with breakneck inference speeds (Qwen3.6-35B-FP8 100+ tok/s)

May 6, 2026
Atlas, an open-source LLM inference engine built in pure Rust and CUDA, is now publicly available targeting NVIDIA GB10 DGX Spark hardware. The stack eliminates PyTorch entirely, delivering 130 tok/s peak on Qwen3.5-35B NVFP4 with MTP K=2, representing a 3.0-3.3x throughput improvement over vLLM at a 2.5 GB image footprint. ML engineers running edge inference or on-premise deployments on GB10 silicon should evaluate this immediately. The sub-2-minute cold start and hand-tuned CUDA kernels directly address Python runtime overhead that conventional serving frameworks leave unoptimized.
resources