AMD Strix Halo NPU Now Usable for Hybrid LLM Inference via ROCm
June 24, 2026
AMD Ryzen AI Strix Halo devices can now run LLMs in hybrid NPU+iGPU mode, with the NPU handling prompt processing in parallel with GPU token generation. ROCm support has matured enough to enable FastFlowLM NPU-compatible models on hardware like the Ryzen AI 395 Max.
HOW THIS AFFECTS YOU
●
builderIf you're targeting on-device inference on AMD Strix Halo hardware, hybrid NPU+iGPU mode is now viable and worth benchmarking against pure Vulkan/GGUF setups.