●builderYou can now run a 428B multimodal MoE with 1M-token context on Blackwell hardware with 2x lower memory overhead using the NVFP4 checkpoint on Hugging Face.
●researcherWorth watching because NVFP4 quantization on a model this size with 1M-token context provides a concrete data point on Blackwell-native precision formats and their fidelity tradeoffs.