[arXiv]score: 0.13

S$^2$-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation

June 29, 2026

Researchers propose S$^2$-VLA, a framework that improves long-horizon robotic manipulation by introducing a State-Space Guided Adaptive Attention mechanism, which dynamically fuses visual, language, and action representations using adaptive gating weights. The model's performance is evaluated on a benchmark dataset, achieving a 25% reduction in error rate compared to existing VLA models. The framework is designed to adapt to different phases of task execution, enabling more accurate and efficient long-horizon manipulation.

read original ↗arxiv.org

← back to feed