[r/LocalLLaMA]score: 0.09

Qwen3.6 Q6 Quant Enables Viable Local Coding Agents on Dual 3090s

May 27, 2026

A user reports Qwen3.6 at Q6 quantization running via llama.cpp on dual RTX 3090s achieves 20–50 tokens/second with MTP enabled, delivering quality comparable to paid APIs for coding agent tasks.

discussion

HOW THIS AFFECTS YOU

●

builderYou can run a competitive local coding agent on dual consumer GPUs using Qwen3.6 Q6 + llama.cpp with MTP, potentially replacing paid API calls for latency-sensitive or privacy-sensitive workflows.

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1tpebhw/qwen36_huge_quality_gain_from_q4_to_q6_for_coding/

← back to feed