[HUGGINGFACE]score: 0.71
Trajel Dataset Exposes Trajectory-Level Hallucinations Final-Output Benchmarks Miss
May 25, 2026
Trajel introduces a five-type hallucination taxonomy (factual, referential, logical, procedural, scope-based) over annotated multi-step agent traces, finding that nearly half of hallucinations in industrial workflows occur in intermediate steps invisible to final-output evaluations.
paper
HOW THIS AFFECTS YOU
●
builderYour current hallucination detection on agent outputs is likely missing ~50% of failures that originate in intermediate Thought-Action-Observation steps.
●
researcherThe trajectory-level annotation and five-type taxonomy provide a more granular evaluation framework than existing hallucination benchmarks for multi-agent systems.
●
policyWorth watching because industrial agentic deployments have systematic hallucination failure modes that standard benchmarks don't surface — relevant for audit and compliance frameworks.