[arXiv]score: 0.41

WriteSAE: Sparse Autoencoders for Recurrent State

May 14, 2026

WriteSAE introduces the first sparse autoencoder targeting matrix cache writes in state-space and hybrid recurrent LLMs like Mamba-2, Gated DeltaNet, and RWKV-7, where residual-stream SAEs cannot reach. It factors decoder atoms into native rank-1 write shapes with a closed-form per-token logit shift and Frobenius-norm training. Critical for mechanistic interpretability researchers working beyond transformer architectures.

cs.LGcs.AIcs.CL

SOURCE

https://arxiv.org/abs/2605.12770

← back to feed