[arXiv]score: 0.21

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

April 30, 2026

RADIO-ViPE is an online semantic SLAM system combining open-vocabulary grounding with monocular RGB-only input, requiring zero camera intrinsics, depth sensors, or pose initialization. It tightly couples RADIO foundation model embeddings into factor graph optimization with adaptive robust kernels, enabling 3D localization of arbitrary natural-language queries in dynamic scenes. Roboticists and embodied-AI engineers deploying perception stacks without calibrated RGB-D hardware should prioritize this, as it eliminates a critical sensor dependency that bottlenecks real-world deployment versus prior NeRF-SLAM or OpenMask3D pipelines.

cs.CV

SOURCE

https://arxiv.org/abs/2604.26067

← back to feed