Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Sai Suresh Macharla Vasu, Ivaxi Sheth, Hui-Po Wang, Ruta Binkyte, Mario Fritz
TL;DR
This study systematically evaluates bias in LLM-assisted peer reviews using a controlled, single-blind framework with counterfactual interventions on author affiliation, inferred gender, seniority, and publication history across nine LLMs. It uncovers robust affiliation bias favoring high-status institutions, detectable in both hard and especially soft rating distributions, and finds mixed gender effects alongside clear seniority- and publication-history cues that inflate ratings for more prestigious profiles. The results demonstrate that subtle metadata can meaningfully alter acceptance decisions near decision thresholds, raising concerns about fairness and reliability in AI-augmented peer review. The authors argue for rigorous evaluation and alignment to mitigate these biases as LLMs become more integrated into scientific evaluation pipelines.
Abstract
The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing more detailed evaluations to generating entire reviews automatically. While these capabilities offer exciting opportunities, they also raise critical concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews by conducting controlled experiments on sensitive metadata, including author affiliation and gender. Our analysis consistently shows affiliation bias favoring institutions highly ranked on common academic rankings. Additionally, we find some gender preferences, which, even though subtle in magnitude, have the potential to compound over time. Notably, we uncover implicit biases that become more evident with token-based soft ratings.
