[arXiv]score: 0.50
Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference
May 15, 2026
Introduces Multi-Scale Dequant (MSD), a quantization framework eliminating dequantization bottleneck in quantized LLM inference on decoupled compute architectures (e.g., Ascend NPUs) by removing weight/KV dequantization overhead.
stat.MLcs.AIcs.LG