HACKOBAR_item
[arXiv]score: 0.38

ABRA: Agent Benchmark for Radiology Applications

May 13, 2026
ABRA is a radiology-agent benchmark featuring 655 programmatically generated tasks where agents operate a real OHIF viewer and Orthanc DICOM server via 21 function-calling tools covering slice navigation, windowing, annotation, and structured reporting across three difficulty tiers. Unlike prior benchmarks using static pre-selected images, ABRA creates a live navigable environment, raising the bar for evaluating clinical AI agents. Radiological AI and medical agent developers need this for realistic deployment-readiness testing.
cs.CVcs.AI