[arXiv]score: 0.49

State-of-the-Art VLMs Fail Analog Clock Reading Due to Suppressed Reasoning, Not Perception

May 26, 2026

Across seven VLMs, analog clock reading errors stem primarily from models producing 5–19x shorter outputs that skip step-by-step reasoning when given image input, not from visual perception failures, with TickTockVQA introduced as a real-world annotated benchmark.

cs.CV

HOW THIS AFFECTS YOU

●

builderWorth noting that image-only prompting can silently degrade chain-of-thought reasoning quality in VLMs — relevant for any pipeline relying on visual input for structured reasoning tasks.

●

researcherIdentifies reasoning suppression under image-only input as a systematic failure mode distinct from perception, with implications for multimodal evaluation design.

SOURCE

https://arxiv.org/abs/2603.08011

← back to feed