HACKOBAR_item
[arXiv]score: 0.20

HIVE: Hidden-Evidence Verification for Hallucination Detection in Diffusion Large Language Models

April 30, 2026
HIVE is a hallucination detection framework specifically designed for diffusion large language models (D-LLMs) like MDLM and Plaid, targeting the multi-step denoising trajectory rather than final outputs alone. It extracts compressed hidden-state evidence across denoising steps and layers, feeds selected evidence into a verifier LM via prefix embeddings, and outputs continuous hallucination scores plus structured rationales. Evaluated across three QA benchmarks, HIVE achieves 0.9236 AUROC and 0.9537 AUPRC, outperforming eight baselines including output-uncertainty and trace-statistic methods. Teams deploying D-LLMs in production should prioritize this over autoregressive-focused detectors, as trajectory-level hidden dynamics expose hallucination signals invisible to output-only approaches.
cs.CL