[arXiv]score: 0.13
Indonesian Radiology VQA Benchmark Reveals Language Robustness Gaps in Medical VLMs
June 3, 2026
IndoRad-VQA adapts the VQA-RAD radiology benchmark to Bahasa Indonesia, exposing accuracy drops and failure modes including yes/no flips and laterality errors when medical VLMs are prompted in non-English. General-purpose, Southeast Asian multilingual, and medical-specific VLMs are all evaluated.
cs.CLcs.CV
HOW THIS AFFECTS YOU
●
researcherQuantifies the language robustness gap for medical VLMs — a useful benchmark for teams evaluating multilingual clinical AI generalization.
●
healthMedical VLMs degrade meaningfully on non-English clinical queries; deploying radiology AI in non-English-speaking markets requires explicit multilingual validation.