[arXiv]score: 0.15
MedSP1000: 1,638 Standardized Patient Cases for Multi-Turn Clinical Agent Eval
June 4, 2026
MedSP1000 converts 1,638 standardized patient teaching cases into executable closed-loop clinical scenarios with 24,602 trajectory-level peer-reviewed rubrics, enabling dynamic multi-turn evaluation of LLM clinical agents across information gathering, treatment planning, and longitudinal management.
cs.CL
HOW THIS AFFECTS YOU
●
researcherThe rubric-validated trajectory-level evaluation design addresses a real gap in single-turn clinical benchmarks and provides a reusable framework for agent assessment.
●
healthMedSP1000 gives clinical AI developers a more realistic benchmark for evaluating LLM agents on dynamic patient encounters rather than static QA tasks.