[arXiv]score: 0.24
Improving Medical VQA through Trajectory-Aware Process Supervision
May 7, 2026
Researchers augmented six medical VQA benchmarks with reasoning trajectories via COMCTS algorithm, then trained vision-language models using Group Relative Policy Optimization with a process-based reward that measures reasoning step alignment via Dynamic Time Warping distance between sentence-embedded trajectories, moving beyond exact-match answer evaluation.
cs.LGcs.CV