[arXiv]score: 0.13

240-Scenario Benchmark Tests MLLMs on Visual Social Cues

June 16, 2026

A new benchmark with 240 scenarios and 2,340 role-task instances evaluates whether multimodal LLMs can interpret visual social signals like facial expressions and gaze during interaction. Testing seven recent MLLMs shows near-saturation on expression and conflict tasks but significant gaps in interaction regulation and visually grounded outcome prediction.

HOW THIS AFFECTS YOU

●

researcherProvides a structured evaluation framework to identify where current MLLMs fail at visual social reasoning, specifically in interaction regulation tasks.

read original ↗arxiv.org

← back to feed