[HUGGINGFACE]score: 0.47

QUACK Framework Audits Multimodal LLM Agent Grounding in Social Deduction Games

May 25, 2026

QUACK is an open-source multimodal evaluation framework that audits LLM agent language grounding across three levels — game outcomes, behavioral trajectories, and utterance-level consistency — using a Statement Verification Pipeline tied to agent perception and actions.

paper

HOW THIS AFFECTS YOU

●

builderYou can use QUACK's open-source pipeline to audit whether your multimodal agents' language is actually grounded in what they perceive, not just contextually plausible.

●

researcherWorth watching because it moves social deduction game evaluation beyond win rates to utterance-level grounding verification, exposing failure modes invisible to outcome-only metrics.

SOURCE

https://huggingface.co/papers/2605.27068

← back to feed