[arXiv]score: 0.09

Two Sparsity Regularizers Improve Top-k SAE Flexibility for Mechanistic Interpretability

June 26, 2026

Top-k sparse autoencoders gain two new regularizers — an L1 penalty on off-support units and a scale-invariant variant — that address the fixed-budget limitation and k-overfitting without abandoning the Top-k architecture. Applied to vision foundation model interpretability, the approach yields more input-adaptive feature decomposition.

HOW THIS AFFECTS YOU

●

researcherIf you're using SAEs for mechanistic interpretability, these regularizers are drop-in additions to Top-k architectures that reduce sensitivity to the chosen k hyperparameter.

read original ↗arxiv.org

← back to feed