[r/LocalLLaMA]score: 0.17

Google's QAT Quantization Is Broken; Unsloth UD Q4_K_XL Recommended as Workaround

June 8, 2026

llama-quantize misquantizes token embeddings to q6k instead of the intended pure quantization, has hardcoded -7 block group values where some groups require 8, and misaligns 32-block groups causing intermingling. Unsloth's UD Q4_K_XL is effectively pure q4_0 and avoids these issues while a patch is in progress.

HOW THIS AFFECTS YOU

●

builderIf you're quantizing Google's recent QAT models with llama-quantize, the output is incorrect — switch to Unsloth UD Q4_K_XL until an upstream patch lands.

read original ↗reddit.com

← back to feed