[arXiv]score: 0.12

LLM Reviews Show Limited Human Alignment and Are Gameable via Iterative Revision

May 29, 2026

Experiments on 2025 ACL Rolling Review submissions show LLM-generated reviews have limited and prompt-sensitive alignment with human reviewers. Authors using an iterative LLM draft-revise workflow can statistically significantly improve LLM review scores, raising concerns about review integrity as conferences pilot LLM-assisted reviewing.

cs.AIcs.MA

HOW THIS AFFECTS YOU

●

researcherIf you submit to venues using LLM-assisted review, this confirms that optimizing drafts against LLM feedback can inflate scores without necessarily improving human-judged quality.

●

policyEmpirical evidence that LLM review systems are gameable at scale should inform conference governance decisions about where and how to deploy automated reviewing tools.

SOURCE

https://arxiv.org/abs/2605.28897

← back to feed