Table of Contents
Fetching ...

What happens when reviewers receive AI feedback in their reviews?

Shiping Chen, Shu Zhong, Duncan P. Brumby, Anna L. Cox

TL;DR

The paper addresses how AI feedback affects peer-review practice by empirically studying a live deployment at ICLR 2025. It uses a mixed-methods design—surveys and follow-up interviews—to examine reviewer perceptions, actions, and attitudes toward AI-generated post-review suggestions. The findings reveal a tension between observable improvements (clarity, tone, and structure) and perceived usefulness or authority, highlighting the importance of timing, role framing, and human ownership in AI-assisted reviewing. The work offers design and governance implications to sustain human-centered AI in peer review, including lightweight, early-stage support, transparency, and system-level safeguards. Overall, it contributes an initial empirical window into AI-assisted peer review and calls for broader community dialogue on values, norms, and governance.

Abstract

AI is reshaping academic research, yet its role in peer review remains polarising and contentious. Advocates see its potential to reduce reviewer burden and improve quality, while critics warn of risks to fairness, accountability, and trust. At ICLR 2025, an official AI feedback tool was deployed to provide reviewers with post-review suggestions. We studied this deployment through surveys and interviews, investigating how reviewers engaged with the tool and perceived its usability and impact. Our findings surface both opportunities and tensions when AI augments in peer review. This work contributes the first empirical evidence of such an AI tool in a live review process, documenting how reviewers respond to AI-generated feedback in a high-stakes review context. We further offer design implications for AI-assisted reviewing that aim to enhance quality while safeguarding human expertise, agency, and responsibility.

What happens when reviewers receive AI feedback in their reviews?

TL;DR

The paper addresses how AI feedback affects peer-review practice by empirically studying a live deployment at ICLR 2025. It uses a mixed-methods design—surveys and follow-up interviews—to examine reviewer perceptions, actions, and attitudes toward AI-generated post-review suggestions. The findings reveal a tension between observable improvements (clarity, tone, and structure) and perceived usefulness or authority, highlighting the importance of timing, role framing, and human ownership in AI-assisted reviewing. The work offers design and governance implications to sustain human-centered AI in peer review, including lightweight, early-stage support, transparency, and system-level safeguards. Overall, it contributes an initial empirical window into AI-assisted peer review and calls for broader community dialogue on values, norms, and governance.

Abstract

AI is reshaping academic research, yet its role in peer review remains polarising and contentious. Advocates see its potential to reduce reviewer burden and improve quality, while critics warn of risks to fairness, accountability, and trust. At ICLR 2025, an official AI feedback tool was deployed to provide reviewers with post-review suggestions. We studied this deployment through surveys and interviews, investigating how reviewers engaged with the tool and perceived its usability and impact. Our findings surface both opportunities and tensions when AI augments in peer review. This work contributes the first empirical evidence of such an AI tool in a live review process, documenting how reviewers respond to AI-generated feedback in a high-stakes review context. We further offer design implications for AI-assisted reviewing that aim to enhance quality while safeguarding human expertise, agency, and responsibility.
Paper Structure (55 sections, 2 figures, 5 tables)

This paper contains 55 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Example of ICLR AI feedback from a participant. Bracketed items [ ] are omitted for confidentiality. Coloured labels are used here to distinguish reviewer comments from AI feedback; the original feedback email was presented in plain text.
  • Figure 2: Participants’ perceptions of the AI feedback tool ($N=51$). The figure shows responses to ten Likert-scale statements (1 = strongly disagree, 7 = strongly agree) displayed as a divergent stacked bar chart. Bars are centred on the neutral midpoint (4 = neither agree nor disagree), with disagreement shown to the left and agreement to the right. The plot also includes consider revising review for easier comparison, though this item is analysed as in the behavioural matrix in Section \ref{['sec:results:behaviour']}.