[arXiv]score: 0.38

SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection

May 15, 2026

Introduces SToRe3D, a sparsity framework for Vision Transformers that jointly selects 2D image tokens and 3D object queries using mutual relevance heads to reduce inference latency in multi-view 3D detection.

cs.CVcs.RO

SOURCE

https://arxiv.org/abs/2605.14110

← back to feed