Table of Contents
Fetching ...

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

Sai Suresh Macharla Vasu, Ivaxi Sheth, Hui-Po Wang, Ruta Binkyte, Mario Fritz

TL;DR

This study systematically evaluates bias in LLM-assisted peer reviews using a controlled, single-blind framework with counterfactual interventions on author affiliation, inferred gender, seniority, and publication history across nine LLMs. It uncovers robust affiliation bias favoring high-status institutions, detectable in both hard and especially soft rating distributions, and finds mixed gender effects alongside clear seniority- and publication-history cues that inflate ratings for more prestigious profiles. The results demonstrate that subtle metadata can meaningfully alter acceptance decisions near decision thresholds, raising concerns about fairness and reliability in AI-augmented peer review. The authors argue for rigorous evaluation and alignment to mitigate these biases as LLMs become more integrated into scientific evaluation pipelines.

Abstract

The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing more detailed evaluations to generating entire reviews automatically. While these capabilities offer exciting opportunities, they also raise critical concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews by conducting controlled experiments on sensitive metadata, including author affiliation and gender. Our analysis consistently shows affiliation bias favoring institutions highly ranked on common academic rankings. Additionally, we find some gender preferences, which, even though subtle in magnitude, have the potential to compound over time. Notably, we uncover implicit biases that become more evident with token-based soft ratings.

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

TL;DR

This study systematically evaluates bias in LLM-assisted peer reviews using a controlled, single-blind framework with counterfactual interventions on author affiliation, inferred gender, seniority, and publication history across nine LLMs. It uncovers robust affiliation bias favoring high-status institutions, detectable in both hard and especially soft rating distributions, and finds mixed gender effects alongside clear seniority- and publication-history cues that inflate ratings for more prestigious profiles. The results demonstrate that subtle metadata can meaningfully alter acceptance decisions near decision thresholds, raising concerns about fairness and reliability in AI-augmented peer review. The authors argue for rigorous evaluation and alignment to mitigate these biases as LLMs become more integrated into scientific evaluation pipelines.

Abstract

The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing more detailed evaluations to generating entire reviews automatically. While these capabilities offer exciting opportunities, they also raise critical concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews by conducting controlled experiments on sensitive metadata, including author affiliation and gender. Our analysis consistently shows affiliation bias favoring institutions highly ranked on common academic rankings. Additionally, we find some gender preferences, which, even though subtle in magnitude, have the potential to compound over time. Notably, we uncover implicit biases that become more evident with token-based soft ratings.

Paper Structure

This paper contains 30 sections, 3 equations, 4 figures, 25 tables.

Figures (4)

  • Figure 1: Publication history bias. % of papers where the LLM assigns a higher rating to the author shown with 100 TTP compared to 0 TTP.
  • Figure 2: Seniority bias. % of papers where the LLM assigns a higher rating to a Senior PI profile compared to an Undergraduate Student.
  • Figure 3: Standardized review prompt used in all LLM experiments.
  • Figure 4: Affiliation bias heatmaps for all evaluated models, ordered by model size. Each cell $(A, B)$ shows the number of papers for which affiliation $A$ received a higher rating than $B$.