●builderIf you are building RAG or grounded advisory systems for industrial or maintenance use cases, this benchmark exposes a specific failure mode — mapped-but-wrong responses — that abstention metrics alone won't catch.
●researcherThe dataset formalizes out-of-scope handling as a distinct evaluation axis separate from hallucination, useful for benchmarking grounding strategies.