[HUGGINGFACE]score: 0.47
QUACK Framework Audits Multimodal LLM Agent Grounding in Social Deduction Games
May 25, 2026
QUACK is an open-source multimodal evaluation framework that audits LLM agent language grounding across three levels — game outcomes, behavioral trajectories, and utterance-level consistency — using a Statement Verification Pipeline tied to agent perception and actions.
paper
HOW THIS AFFECTS YOU
●
builderYou can use QUACK's open-source pipeline to audit whether your multimodal agents' language is actually grounded in what they perceive, not just contextually plausible.
●
researcherWorth watching because it moves social deduction game evaluation beyond win rates to utterance-level grounding verification, exposing failure modes invisible to outcome-only metrics.