[r/LocalLLaMA]score: 0.17

Great results with Qwen3.6-35B-A3B-UD-Q5_K_XL + VS Code and Copilot

May 6, 2026

Qwen3.6-35B-A3B, a 35B-parameter Mixture-of-Experts model activating only 3B parameters per token, is drawing serious practitioner attention on consumer hardware. A Reddit engineer ran the Q5_K_XL GGUF quantization via llama.cpp Vulkan on a single AMD RX 9700, with 262K context, flash attention, and q8_0 KV cache, generating a functional website and full Playwright test suite with minimal intervention. At roughly 3B active parameters per forward pass, inference cost rivals dense 3B models while retaining 35B capacity, making it a compelling local alternative to GPT-4o for agentic coding workflows. VS Code Copilot integration via llama-cpp server makes this immediately actionable for any engineer running a mid-range discrete GPU.

tutorial | guide

SOURCE

https://www.reddit.com/r/LocalLLaMA/comments/1t5pdf8/great_results_with_qwen3635ba3budq5_k_xl_vs_code/

← back to feed