[HUGGINGFACE]score: 0.71

Trajel Dataset Exposes Trajectory-Level Hallucinations Final-Output Benchmarks Miss

May 25, 2026

Trajel introduces a five-type hallucination taxonomy (factual, referential, logical, procedural, scope-based) over annotated multi-step agent traces, finding that nearly half of hallucinations in industrial workflows occur in intermediate steps invisible to final-output evaluations.

paper

HOW THIS AFFECTS YOU

●

builderYour current hallucination detection on agent outputs is likely missing ~50% of failures that originate in intermediate Thought-Action-Observation steps.

●

researcherThe trajectory-level annotation and five-type taxonomy provide a more granular evaluation framework than existing hallucination benchmarks for multi-agent systems.

●

policyWorth watching because industrial agentic deployments have systematic hallucination failure modes that standard benchmarks don't surface — relevant for audit and compliance frameworks.

SOURCE

https://huggingface.co/papers/2605.24219

← back to feed