Google's QAT Quantization Is Broken; Unsloth UD Q4_K_XL Recommended as Workaround
June 8, 2026
llama-quantize misquantizes token embeddings to q6k instead of the intended pure quantization, has hardcoded -7 block group values where some groups require 8, and misaligns 32-block groups causing intermingling. Unsloth's UD Q4_K_XL is effectively pure q4_0 and avoids these issues while a patch is in progress.
HOW THIS AFFECTS YOU
●
builderIf you're quantizing Google's recent QAT models with llama-quantize, the output is incorrect — switch to Unsloth UD Q4_K_XL until an upstream patch lands.