[arXiv]score: 0.41
Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
May 14, 2026
Study reveals that removing vision modality from VLMs causes large accuracy drops and severe miscalibration; shows generated images partially restore accuracy and calibration, indicating the failure is not solely due to missing semantic information.
cs.CLcs.AIcs.CV