[HUGGINGFACE]score: 0.55
Perceptual Judgment Bias: MLLM Judges Favor Text Over Visual Evidence
May 31, 2026
Controlled visual perturbation experiments show that multimodal LLM judges systematically reward plausible text responses over perceptually correct ones when the two conflict, a failure mode termed Perceptual Judgment Bias. A counterfactual dataset called Perceptually Perturbed Judgment Dataset is introduced to isolate and measure this bias.
paper
HOW THIS AFFECTS YOU
●
builderIf you're using multimodal LLMs as automated evaluators for vision tasks, this finding means your eval pipeline may systematically reward hallucinated but plausible text over correct visual grounding.
●
researcherPerceptual Judgment Bias is a concrete, measurable failure mode in MLLM evaluation pipelines — the counterfactual dataset construction methodology is directly applicable to auditing existing judge models.