240-Scenario Benchmark Tests MLLMs on Visual Social Cues
June 16, 2026
A new benchmark with 240 scenarios and 2,340 role-task instances evaluates whether multimodal LLMs can interpret visual social signals like facial expressions and gaze during interaction. Testing seven recent MLLMs shows near-saturation on expression and conflict tasks but significant gaps in interaction regulation and visually grounded outcome prediction.
HOW THIS AFFECTS YOU
●
researcherProvides a structured evaluation framework to identify where current MLLMs fail at visual social reasoning, specifically in interaction regulation tasks.