[arXiv]score: 0.41

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models

May 14, 2026

Study reveals that removing vision modality from VLMs causes large accuracy drops and severe miscalibration; shows generated images partially restore accuracy and calibration, indicating the failure is not solely due to missing semantic information.

cs.CLcs.AIcs.CV

SOURCE

https://arxiv.org/abs/2605.12517

← back to feed