Norm-preserving abliteration maintains Qwen3.6-35B-A3B performance while removing refusals
June 30, 2026
Biprojection technique removes refusal directions from weight matrices by orthogonalizing and rescaling rows to original L2 norms. This prevents the residual stream magnitude decay typical of vanilla abliteration, achieving 0% refusal on Qwen3.6-35B-A3B without benchmark degradation.
HOW THIS AFFECTS YOU
●
builderYou can bypass safety refusals in MoE models without sacrificing general reasoning capabilities.
●
researcherThis provides a method to mitigate the unintended side effects of mechanistic interpretability-based weight manipulation.