[HUGGINGFACE]score: 0.88
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
May 11, 2026
Safety alignment degrades general utility through gradient interference between sequential post-training stages; orthogonal gradient projection mitigates this alignment tax by preserving directions supporting pre-existing capabilities.
paper