PeerArg: Argumentative Peer Review with LLMs
Purin Sukpanichnant, Anna Rapberger, Francesca Toni
TL;DR
The paper addresses biases and opacity in peer review by introducing PeerArg, a hybrid system that fuses large language models with symbolic, quantitative argumentation to produce interpretable acceptance predictions. The approach centers on a three-stage pipeline: (i) Review QBAF Extractor to convert reviews into argumentation frameworks, (ii) Review QBAFs Combinator to merge multiple reviews, and (iii) Pre-MPAF Aggregator to derive a final decision via two aggregation paths. Empirically, PeerArg is evaluated against a few-shot end-to-end LLM on three datasets (PRA, PeerRead, MOPRD) and, with carefully chosen hyperparameters, outperforms the end-to-end model while offering transparent rationale for the decision. The work advances practical trust and explainability in automated peer-review decision making, with future directions toward enhanced explainability and handling aggregation uncertainty.
Abstract
Peer review is an essential process to determine the quality of papers submitted to scientific conferences or journals. However, it is subjective and prone to biases. Several studies have been conducted to apply techniques from NLP to support peer review, but they are based on black-box techniques and their outputs are difficult to interpret and trust. In this paper, we propose a novel pipeline to support and understand the reviewing and decision-making processes of peer review: the PeerArg system combining LLMs with methods from knowledge representation. PeerArg takes in input a set of reviews for a paper and outputs the paper acceptance prediction. We evaluate the performance of the PeerArg pipeline on three different datasets, in comparison with a novel end-2-end LLM that uses few-shot learning to predict paper acceptance given reviews. The results indicate that the end-2-end LLM is capable of predicting paper acceptance from reviews, but a variant of the PeerArg pipeline outperforms this LLM.
