A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

Dongyeop Kang; Waleed Ammar; Bhavana Dalvi; Madeleine van Zuylen; Sebastian Kohlmeier; Eduard Hovy; Roy Schwartz

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

TL;DR

PeerRead introduces the first public, research-focused peer-review dataset, combining 14.7K draft submissions with accept/reject decisions and a subset of 10.7K expert reviews. It aggregates opt-in, web, and arXiv-derived labels across venues like ACL, NIPS, and ICLR, and provides structured aspect scores for a subset of reviews. The authors analyze data-driven patterns between overall recommendations and individual review aspects, and demonstrate two NLP tasks—acceptance prediction and aspect-score regression—with simple models achieving meaningful gains over baselines. The dataset enables reproducible analysis and practical NLP applications to assist reviewers and authors, while highlighting opportunities to explore biases and cross-venue differences in peer review.

Abstract

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1) providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the data collection process and report interesting observed phenomena in the peer reviews. We also propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. In the second task, we predict the numerical scores of review aspects and show that simple models can outperform the mean baseline for aspects with high variance such as 'originality' and 'impact'.

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

TL;DR

Abstract

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)