[HUGGINGFACE]score: 0.36

AffordanceVLA Adds Structured Affordance Forecasting to Robotic VLA Models

June 3, 2026

AffordanceVLA introduces three intermediate representations — object-centric grounding (Which2Act), 2D interaction localization (Where2Act), and a third component — between vision-language understanding and action generation to address the semantic-to-control mismatch in VLA models. The framework targets manipulation precision in instruction-following robotics.

HOW THIS AFFECTS YOU

●

researcherThe affordance-as-intermediate-representation approach is a concrete architectural contribution to the VLA perception-action gap problem, worth benchmarking against OpenVLA and similar baselines.

read original ↗huggingface.co

← back to feed