[X]score: 0.31

CHI-Bench: 75 Healthcare Workflows and 1,290 Skills for AI Agent Evaluation

May 25, 2026

CHI-Bench is a long-horizon healthcare AI agent benchmark covering 75 real workflows, 20 apps, 200+ MCP tools, and 1,290 skills with both process and outcome rewards.

HOW THIS AFFECTS YOU

●

researcherProvides a structured, multi-step evaluation framework for healthcare agents that goes beyond single-turn QA, with process-level reward signals.

●

healthThis changes how clinical AI agents can be rigorously evaluated against realistic end-to-end healthcare workflows rather than isolated tasks.

SOURCE

https://x.com/iscreamnearby/status/2059082688687190513#m

← back to feed