Table of Contents
Fetching ...

Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback

Sukannya Purkayastha, Qile Wan, Anne Lauscher, Lizhen Qu, Iryna Gurevych

TL;DR

The paper addresses declining peer-review quality driven by lazy thinking and vague feedback under increasing submission volumes. It introduces LazyReviewPlus, a single-segment multi-label dataset for lazy thinking and specificity, and an LLM-guided framework that combines a neuro-symbolic issue detector with a genetic-algorithm–enhanced feedback generator. The proposed system outperforms zero-shot baselines in issue detection and delivers targeted, guideline-aligned feedback, with substantial improvements in constructiveness and relevance and strong alignment with Prometheus and human judgments. The LazyReviewPlus dataset is released to support further research and practical deployment toward scalable, high-quality peer review.

Abstract

Peer review is central to scientific quality, yet reliance on simple heuristics -- lazy thinking -- has lowered standards. Prior work treats lazy thinking detection as a single-label task, but review segments may exhibit multiple issues, including broader clarity problems, or specificity issues. Turning detection into actionable improvements requires guideline-aware feedback, which is currently missing. We introduce an LLM-driven framework that decomposes reviews into argumentative segments, identifies issues via a neurosymbolic module combining LLM features with traditional classifiers, and generates targeted feedback using issue-specific templates refined by a genetic algorithm. Experiments show our method outperforms zero-shot LLM baselines and improves review quality by up to 92.4\%. We also release LazyReviewPlus, a dataset of 1,309 sentences labeled for lazy thinking and specificity.

Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback

TL;DR

The paper addresses declining peer-review quality driven by lazy thinking and vague feedback under increasing submission volumes. It introduces LazyReviewPlus, a single-segment multi-label dataset for lazy thinking and specificity, and an LLM-guided framework that combines a neuro-symbolic issue detector with a genetic-algorithm–enhanced feedback generator. The proposed system outperforms zero-shot baselines in issue detection and delivers targeted, guideline-aligned feedback, with substantial improvements in constructiveness and relevance and strong alignment with Prometheus and human judgments. The LazyReviewPlus dataset is released to support further research and practical deployment toward scalable, high-quality peer review.

Abstract

Peer review is central to scientific quality, yet reliance on simple heuristics -- lazy thinking -- has lowered standards. Prior work treats lazy thinking detection as a single-label task, but review segments may exhibit multiple issues, including broader clarity problems, or specificity issues. Turning detection into actionable improvements requires guideline-aware feedback, which is currently missing. We introduce an LLM-driven framework that decomposes reviews into argumentative segments, identifies issues via a neurosymbolic module combining LLM features with traditional classifiers, and generates targeted feedback using issue-specific templates refined by a genetic algorithm. Experiments show our method outperforms zero-shot LLM baselines and improves review quality by up to 92.4\%. We also release LazyReviewPlus, a dataset of 1,309 sentences labeled for lazy thinking and specificity.
Paper Structure (47 sections, 1 equation, 22 figures, 29 tables)

This paper contains 47 sections, 1 equation, 22 figures, 29 tables.

Figures (22)

  • Figure 1: Overall pipeline of our method. We first identify segments within a review, then detect issues, and finally generate feedback to improve each segment.
  • Figure 2: Overview of sentence and label distributions in our proposed dataset, LazyReviewPlus. Issue names have been rewritten for brevity.
  • Figure 3: Comparing the label distribution using various split methods.
  • Figure 4: Confusion matrices for all the models on the segment detection task.
  • Figure 5: Issue Identification Prompt
  • ...and 17 more figures