[arXiv]score: 0.13

A Definition of Good Explanations and the Challenges Explaining LLM Outputs

June 16, 2026

Good explanations, defined here as counterfactual statements weighted by the interlocutor's prior beliefs, create a structural problem for LLM explainability: because LLM outputs depend on billions of parameters with no discrete causal chain, satisfying both the counterfactual and belief-updating criteria simultaneously is intractable. The paper argues this isn't a tooling gap but a fundamental mismatch between how LLMs generate outputs and what explanations require.

read original ↗arxiv.org

← back to feed