[r/LocalLLaMA]score: 0.25

NVIDIA Quantizes Qwen3-235B-A22B to NVFP4, Cuts Memory 3x

May 30, 2026

NVIDIA's NVFP4 quantization of Qwen3.6-35B-A3B reduces GPU memory and disk footprint by 3.06x versus BF16, with weights and activations of MoE transformer linear layers quantized from 16 to 4 bits. The model is ready for inference with vLLM via NVIDIA Model Optimizer.

discussion

HOW THIS AFFECTS YOU

●

builderYou can deploy Qwen3.6-35B-A3B on roughly 3x less GPU memory using vLLM today with this drop-in NVFP4 checkpoint.

●

researcherWorth watching for accuracy retention data across MMLU Pro, GPQA Diamond, AIME 2025, and SciCode benchmarks at 4-bit MoE quantization.

SOURCE

https://huggingface.co/nvidia/Qwen3.6-35B-A3B-NVFP4

← back to feed