Table of Contents
Fetching ...

Identifying Aspects in Peer Reviews

Sheng Lu, Ilia Kuznetsov, Iryna Gurevych

TL;DR

The paper tackles the challenge of standardizing and supporting peer review amid increasing submission volumes by introducing a data-driven, bottom-up notion of review aspects. It defines an operational concept of aspect, builds a GPT-4o–driven workflow to extract and organize aspects into a multi-level taxonomy, and releases a dataset of reviews annotated with aspects. Two tasks are proposed and evaluated: paper aspect prediction (PAP) and review aspect prediction (RAP), revealing that coarse-grained aspects yield robust performance for high-level analyses while fine-grained aspects enable nuanced applications, including LLM-generated review detection. The work demonstrates that fine-grained, data-driven aspects complement guideline-based schemata and offer a principled foundation for NLP-assisted peer review, with practical implications for track-aware review guidelines and interpretable LLM-detection methods.

Abstract

Peer review is central to academic publishing, but the growing volume of submissions is straining the process. This motivates the development of computational approaches to support peer review. While each review is tailored to a specific paper, reviewers often make assessments according to certain aspects such as Novelty, which reflect the values of the research community. This alignment creates opportunities for standardizing the reviewing process, improving quality control, and enabling computational support. While prior work has demonstrated the potential of aspect analysis for peer review assistance, the notion of aspect remains poorly formalized. Existing approaches often derive aspects from review forms and guidelines, yet data-driven methods for aspect identification are underexplored. To address this gap, our work takes a bottom-up approach: we propose an operational definition of aspect and develop a data-driven schema for deriving aspects from a corpus of peer reviews. We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis. We further show how the choice of aspects can impact downstream applications, such as LLM-generated review detection. Our results lay a foundation for a principled and data-driven investigation of review aspects, and pave the path for new applications of NLP to support peer review.

Identifying Aspects in Peer Reviews

TL;DR

The paper tackles the challenge of standardizing and supporting peer review amid increasing submission volumes by introducing a data-driven, bottom-up notion of review aspects. It defines an operational concept of aspect, builds a GPT-4o–driven workflow to extract and organize aspects into a multi-level taxonomy, and releases a dataset of reviews annotated with aspects. Two tasks are proposed and evaluated: paper aspect prediction (PAP) and review aspect prediction (RAP), revealing that coarse-grained aspects yield robust performance for high-level analyses while fine-grained aspects enable nuanced applications, including LLM-generated review detection. The work demonstrates that fine-grained, data-driven aspects complement guideline-based schemata and offer a principled foundation for NLP-assisted peer review, with practical implications for track-aware review guidelines and interpretable LLM-detection methods.

Abstract

Peer review is central to academic publishing, but the growing volume of submissions is straining the process. This motivates the development of computational approaches to support peer review. While each review is tailored to a specific paper, reviewers often make assessments according to certain aspects such as Novelty, which reflect the values of the research community. This alignment creates opportunities for standardizing the reviewing process, improving quality control, and enabling computational support. While prior work has demonstrated the potential of aspect analysis for peer review assistance, the notion of aspect remains poorly formalized. Existing approaches often derive aspects from review forms and guidelines, yet data-driven methods for aspect identification are underexplored. To address this gap, our work takes a bottom-up approach: we propose an operational definition of aspect and develop a data-driven schema for deriving aspects from a corpus of peer reviews. We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis. We further show how the choice of aspects can impact downstream applications, such as LLM-generated review detection. Our results lay a foundation for a principled and data-driven investigation of review aspects, and pave the path for new applications of NLP to support peer review.

Paper Structure

This paper contains 29 sections, 3 equations, 6 figures, 24 tables.

Figures (6)

  • Figure 1: The 5 most frequent aspects in 4 submission tracks in EMNLP23. Figure \ref{['frequency_all_tracks']} shows the full results.
  • Figure 2: The heatmap of the Jaccard similarity between each pair of the human-written reviews and LLM-generated reviews generated using EMNLP23 papers and liang2024feedback's prompt. Figure \ref{['more_on_heatmap_comparison']} in Appendix \ref{['more_on_review_comparison']} shows the rest of the results.
  • Figure 3: The Fleiss' Kappas for the annotations across different ranges. For example, "600" on the x-axis indicates that the first 600 annotations have a Fleiss' Kappa of 0.4739. The overall Fleiss' Kappa is 0.1944. \ref{['validity_check']}
  • Figure 4: The Levenshtein similarity of the 10 most frequent aspects within each submission track in EMNLP23. \ref{['aspect_analysis']}
  • Figure 5: The 5 most frequent aspects in each of the submission tracks in EMNLP23. \ref{['aspect_analysis']}\ref{['more_on_aspect_analysis']}
  • ...and 1 more figures