AffordanceVLA Adds Structured Affordance Forecasting to Robotic VLA Models
June 3, 2026
AffordanceVLA introduces three intermediate representations — object-centric grounding (Which2Act), 2D interaction localization (Where2Act), and a third component — between vision-language understanding and action generation to address the semantic-to-control mismatch in VLA models. The framework targets manipulation precision in instruction-following robotics.
HOW THIS AFFECTS YOU
●
researcherThe affordance-as-intermediate-representation approach is a concrete architectural contribution to the VLA perception-action gap problem, worth benchmarking against OpenVLA and similar baselines.