[arXiv]score: 0.37

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

May 14, 2026

A new token-level influence function framework for LLMs addresses two key limitations of prior work: restriction to autoregressive settings and false token-independence assumptions. The method uses orthogonal latent space decomposition to isolate which training tokens drive specific predictions, critical for healthcare AI auditability. This advances interpretability beyond example-level attribution toward fine-grained causal tracing.

cs.LGcs.AI

SOURCE

https://arxiv.org/abs/2605.12809

← back to feed