Probing LLM Activations Reveals Math Operations Encoded as Vectors
June 5, 2026
Linear probes trained on frozen LLM activations can decode arithmetic operation type and operands (e.g., gcd(84,36)) from hidden states, showing the information is linearly readable. Critically, this confirms representational encoding but not causal role — the probed directions may not drive model behavior.
HOW THIS AFFECTS YOU
●
researcherWorth watching because it sharpens the interpretability distinction between correlation and causation in activation probing — a methodological point relevant to mechanistic interpretability work.