[X]score: 0.31
CHI-Bench: 75 Healthcare Workflows and 1,290 Skills for AI Agent Evaluation
May 25, 2026
CHI-Bench is a long-horizon healthcare AI agent benchmark covering 75 real workflows, 20 apps, 200+ MCP tools, and 1,290 skills with both process and outcome rewards.
HOW THIS AFFECTS YOU
●
researcherProvides a structured, multi-step evaluation framework for healthcare agents that goes beyond single-turn QA, with process-level reward signals.
●
healthThis changes how clinical AI agents can be rigorously evaluated against realistic end-to-end healthcare workflows rather than isolated tasks.