Imaginative Perception Tokens Improve VLM Spatial Reasoning on Unseen Viewpoints
June 2, 2026
Imaginative Perception Tokens (IPT) are intermediate representations that externalize what a VLM would perceive from alternative spatial configurations, targeting tasks like perspective-taking, path tracing, and multiview counting. Three new benchmark tasks with ~20K examples evaluate spatial reasoning requiring inference beyond directly observable information.
HOW THIS AFFECTS YOU
●
researcherIPT introduces a concrete mechanism and three benchmarks for measuring imaginative spatial reasoning, a capability gap not well-covered by existing VLM evals.