[arXiv]score: 0.13

Indonesian Radiology VQA Benchmark Reveals Language Robustness Gaps in Medical VLMs

June 3, 2026

IndoRad-VQA adapts the VQA-RAD radiology benchmark to Bahasa Indonesia, exposing accuracy drops and failure modes including yes/no flips and laterality errors when medical VLMs are prompted in non-English. General-purpose, Southeast Asian multilingual, and medical-specific VLMs are all evaluated.

cs.CLcs.CV

HOW THIS AFFECTS YOU

●

researcherQuantifies the language robustness gap for medical VLMs — a useful benchmark for teams evaluating multilingual clinical AI generalization.

●

healthMedical VLMs degrade meaningfully on non-English clinical queries; deploying radiology AI in non-English-speaking markets requires explicit multilingual validation.

SOURCE

https://arxiv.org/abs/2606.03693

← back to feed