[arXiv]score: 0.37
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
May 14, 2026
A new token-level influence function framework for LLMs addresses two key limitations of prior work: restriction to autoregressive settings and false token-independence assumptions. The method uses orthogonal latent space decomposition to isolate which training tokens drive specific predictions, critical for healthcare AI auditability. This advances interpretability beyond example-level attribution toward fine-grained causal tracing.
cs.LGcs.AI