Two Sparsity Regularizers Improve Top-k SAE Flexibility for Mechanistic Interpretability
June 26, 2026
Top-k sparse autoencoders gain two new regularizers — an L1 penalty on off-support units and a scale-invariant variant — that address the fixed-budget limitation and k-overfitting without abandoning the Top-k architecture. Applied to vision foundation model interpretability, the approach yields more input-adaptive feature decomposition.
HOW THIS AFFECTS YOU
●
researcherIf you're using SAEs for mechanistic interpretability, these regularizers are drop-in additions to Top-k architectures that reduce sensitivity to the chosen k hyperparameter.