[HUGGINGFACE]score: 0.49

LingxiDiag-16K: Multi-Agent Benchmark for Chinese Psychiatric LLM Evaluation

June 10, 2026

LingxiDiagBench introduces a 16,000-dialogue synthetic dataset of EMR-aligned Chinese psychiatric consultations, evaluating LLMs on both static diagnostic inference and dynamic multi-turn consultation. The benchmark addresses the gap in clinician-verified, demographically representative psychiatric AI evaluation data.

HOW THIS AFFECTS YOU

●

researcherProvides a structured multi-turn evaluation framework for psychiatric LLMs with clinician-verified labels, useful for benchmarking dialogue agents in clinical settings.

●

healthWorth watching as a rare large-scale Chinese-language psychiatric benchmark with realistic clinical distributions, relevant for teams building mental health AI tools.

read original ↗huggingface.co

← back to feed