[arXiv]score: 0.12
LLM Reviews Show Limited Human Alignment and Are Gameable via Iterative Revision
May 29, 2026
Experiments on 2025 ACL Rolling Review submissions show LLM-generated reviews have limited and prompt-sensitive alignment with human reviewers. Authors using an iterative LLM draft-revise workflow can statistically significantly improve LLM review scores, raising concerns about review integrity as conferences pilot LLM-assisted reviewing.
cs.AIcs.MA
HOW THIS AFFECTS YOU
●
researcherIf you submit to venues using LLM-assisted review, this confirms that optimizing drafts against LLM feedback can inflate scores without necessarily improving human-judged quality.
●
policyEmpirical evidence that LLM review systems are gameable at scale should inform conference governance decisions about where and how to deploy automated reviewing tools.