OT-Based Hallucination Detection Peaks at Layers L1–L4, Fails on Summarization
June 12, 2026
Optimal transport metrics applied layer-by-layer to a Fairseq DE-EN decoder show hallucination detection concentrates in layers L1–L4, with L5 anti-predictive for subtle errors. On abstractive summarization (AggreFact), the unsupervised OT detector hits only 57.2–57.6% balanced accuracy on CNN/XSum, well below supervised baselines like MiniCheck-Flan-T5-L.
HOW THIS AFFECTS YOU
●
researcherLayer-resolved OT signals offer a supervision-free diagnostic for NMT hallucinations, but the poor transfer to summarization faithfulness detection limits broader applicability.