Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback
Sukannya Purkayastha, Qile Wan, Anne Lauscher, Lizhen Qu, Iryna Gurevych
TL;DR
The paper addresses declining peer-review quality driven by lazy thinking and vague feedback under increasing submission volumes. It introduces LazyReviewPlus, a single-segment multi-label dataset for lazy thinking and specificity, and an LLM-guided framework that combines a neuro-symbolic issue detector with a genetic-algorithm–enhanced feedback generator. The proposed system outperforms zero-shot baselines in issue detection and delivers targeted, guideline-aligned feedback, with substantial improvements in constructiveness and relevance and strong alignment with Prometheus and human judgments. The LazyReviewPlus dataset is released to support further research and practical deployment toward scalable, high-quality peer review.
Abstract
Peer review is central to scientific quality, yet reliance on simple heuristics -- lazy thinking -- has lowered standards. Prior work treats lazy thinking detection as a single-label task, but review segments may exhibit multiple issues, including broader clarity problems, or specificity issues. Turning detection into actionable improvements requires guideline-aware feedback, which is currently missing. We introduce an LLM-driven framework that decomposes reviews into argumentative segments, identifies issues via a neurosymbolic module combining LLM features with traditional classifiers, and generates targeted feedback using issue-specific templates refined by a genetic algorithm. Experiments show our method outperforms zero-shot LLM baselines and improves review quality by up to 92.4\%. We also release LazyReviewPlus, a dataset of 1,309 sentences labeled for lazy thinking and specificity.
