[arXiv]score: 0.38
SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection
May 15, 2026
Introduces SToRe3D, a sparsity framework for Vision Transformers that jointly selects 2D image tokens and 3D object queries using mutual relevance heads to reduce inference latency in multi-view 3D detection.
cs.CVcs.RO