Bag of Dims: Training-Free Interpretability via Sign Patterns in Transformer Hidden States
June 16, 2026
Sign patterns of individual dimensions in transformer hidden states form a training-free feature basis that preserves 60–93% top-5 next-token accuracy across seven models including Qwen3-32B, Gemma 3-4B, DINOv2, and an audio transformer. No learned rotation or sparse autoencoder is needed — features are read by counting sign agreements.
HOW THIS AFFECTS YOU
●
researcherOffers a zero-cost, architecture-general interpretability method validated across language, vision, and audio models that could replace or complement SAE-based approaches.