[HUGGINGFACE]score: 0.55

Bag of Dims: Training-Free Interpretability via Sign Patterns in Transformer Hidden States

June 16, 2026

Sign patterns of individual dimensions in transformer hidden states form a training-free feature basis that preserves 60–93% top-5 next-token accuracy across seven models including Qwen3-32B, Gemma 3-4B, DINOv2, and an audio transformer. No learned rotation or sparse autoencoder is needed — features are read by counting sign agreements.

HOW THIS AFFECTS YOU

●

researcherOffers a zero-cost, architecture-general interpretability method validated across language, vision, and audio models that could replace or complement SAE-based approaches.

read original ↗huggingface.co

← back to feed