Table of Contents
Fetching ...

PeerArg: Argumentative Peer Review with LLMs

Purin Sukpanichnant, Anna Rapberger, Francesca Toni

TL;DR

The paper addresses biases and opacity in peer review by introducing PeerArg, a hybrid system that fuses large language models with symbolic, quantitative argumentation to produce interpretable acceptance predictions. The approach centers on a three-stage pipeline: (i) Review QBAF Extractor to convert reviews into argumentation frameworks, (ii) Review QBAFs Combinator to merge multiple reviews, and (iii) Pre-MPAF Aggregator to derive a final decision via two aggregation paths. Empirically, PeerArg is evaluated against a few-shot end-to-end LLM on three datasets (PRA, PeerRead, MOPRD) and, with carefully chosen hyperparameters, outperforms the end-to-end model while offering transparent rationale for the decision. The work advances practical trust and explainability in automated peer-review decision making, with future directions toward enhanced explainability and handling aggregation uncertainty.

Abstract

Peer review is an essential process to determine the quality of papers submitted to scientific conferences or journals. However, it is subjective and prone to biases. Several studies have been conducted to apply techniques from NLP to support peer review, but they are based on black-box techniques and their outputs are difficult to interpret and trust. In this paper, we propose a novel pipeline to support and understand the reviewing and decision-making processes of peer review: the PeerArg system combining LLMs with methods from knowledge representation. PeerArg takes in input a set of reviews for a paper and outputs the paper acceptance prediction. We evaluate the performance of the PeerArg pipeline on three different datasets, in comparison with a novel end-2-end LLM that uses few-shot learning to predict paper acceptance given reviews. The results indicate that the end-2-end LLM is capable of predicting paper acceptance from reviews, but a variant of the PeerArg pipeline outperforms this LLM.

PeerArg: Argumentative Peer Review with LLMs

TL;DR

The paper addresses biases and opacity in peer review by introducing PeerArg, a hybrid system that fuses large language models with symbolic, quantitative argumentation to produce interpretable acceptance predictions. The approach centers on a three-stage pipeline: (i) Review QBAF Extractor to convert reviews into argumentation frameworks, (ii) Review QBAFs Combinator to merge multiple reviews, and (iii) Pre-MPAF Aggregator to derive a final decision via two aggregation paths. Empirically, PeerArg is evaluated against a few-shot end-to-end LLM on three datasets (PRA, PeerRead, MOPRD) and, with carefully chosen hyperparameters, outperforms the end-to-end model while offering transparent rationale for the decision. The work advances practical trust and explainability in automated peer-review decision making, with future directions toward enhanced explainability and handling aggregation uncertainty.

Abstract

Peer review is an essential process to determine the quality of papers submitted to scientific conferences or journals. However, it is subjective and prone to biases. Several studies have been conducted to apply techniques from NLP to support peer review, but they are based on black-box techniques and their outputs are difficult to interpret and trust. In this paper, we propose a novel pipeline to support and understand the reviewing and decision-making processes of peer review: the PeerArg system combining LLMs with methods from knowledge representation. PeerArg takes in input a set of reviews for a paper and outputs the paper acceptance prediction. We evaluate the performance of the PeerArg pipeline on three different datasets, in comparison with a novel end-2-end LLM that uses few-shot learning to predict paper acceptance given reviews. The results indicate that the end-2-end LLM is capable of predicting paper acceptance from reviews, but a variant of the PeerArg pipeline outperforms this LLM.
Paper Structure (21 sections, 15 equations, 11 figures, 2 tables)

This paper contains 21 sections, 15 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Overview of the PeerArg pipeline. Firstly, a bipolar argumentation framework BAF $i$ is extracted from review $i$. Then, the frameworks are combined. The final decision is drawn from the combined framework.
  • Figure 2: End-2-End LLM Input Template (the sample reviews are truncated due to lack of space)
  • Figure 3: PeerArg pipeline diagram
  • Figure 4: The process to obtain a review QBAF from a review.
  • Figure 5: From an incomplete QBAF to a complete QBAF, with edges with minus and plus being attacks and supports respectively. (1) First, we apply QBAF semantics to get strengths for aspect arguments. (2) From the aspect arguments' strengths, we determine their relation to the decision argument; and calculate their scores (red and green numbers). (3) We apply QBAF semantics to get the final strength of the decision argument.
  • ...and 6 more figures

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Definition 4
  • Example 2
  • Definition 5
  • Example 3
  • Definition 6
  • Definition 7
  • ...and 4 more