[HUGGINGFACE]score: 0.48
VLMs Conflate Vertical Image Position with Depth, Bias Worsens with Scale
May 27, 2026
A probing study using minimal contrastive pairs finds that VLMs consistently entangle vertical image position with perceived distance — a perspective bias from natural photo statistics. This accuracy gap between perspective-consistent and counter-heuristic examples grows under data scaling even as overall benchmark scores improve, suggesting benchmarks mask a structural spatial reasoning failure.
paper
HOW THIS AFFECTS YOU
●
researcherBenchmark accuracy improvements on spatial tasks may be masking a deepening heuristic shortcut rather than genuine 3D understanding — contrastive probing pairs are a useful diagnostic tool to adopt.