Automatic Paper Reviewing with Heterogeneous Graph Reasoning over LLM-Simulated Reviewer-Author Debates
Shuaimin Li, Liyang Fan, Yufang Lin, Zeyang Li, Xian Wei, Shiwen Ni, Hamid Alinejad-Rokny, Min Yang
TL;DR
Automatic paper reviewing is challenged by workload and subjectivity. ReViewGraph introduces a novel pipeline that simulates multi-round reviewer–author debates with LLMs, encodes them as a four-type node, eight-type graph, and applies a heterogeneous graph transformer to predict accept/reject decisions. The approach demonstrates strong, statistically significant improvements over seven baselines across three OpenReview-derived ICLR datasets, without requiring LLM parameter updates. Ablation and case studies highlight the importance of explicit debate structure and heterogeneous reasoning for robust, interpretable decisions. This work offers a scalable framework that advances trustworthy AI-assisted peer review by capturing argumentative dynamics and evaluation criteria in a structured, graph-based representation.
Abstract
Existing paper review methods often rely on superficial manuscript features or directly on large language models (LLMs), which are prone to hallucinations, biased scoring, and limited reasoning capabilities. Moreover, these methods often fail to capture the complex argumentative reasoning and negotiation dynamics inherent in reviewer-author interactions. To address these limitations, we propose ReViewGraph (Reviewer-Author Debates Graph Reasoner), a novel framework that performs heterogeneous graph reasoning over LLM-simulated multi-round reviewer-author debates. In our approach, reviewer-author exchanges are simulated through LLM-based multi-agent collaboration. Diverse opinion relations (e.g., acceptance, rejection, clarification, and compromise) are then explicitly extracted and encoded as typed edges within a heterogeneous interaction graph. By applying graph neural networks to reason over these structured debate graphs, ReViewGraph captures fine-grained argumentative dynamics and enables more informed review decisions. Extensive experiments on three datasets demonstrate that ReViewGraph outperforms strong baselines with an average relative improvement of 15.73%, underscoring the value of modeling detailed reviewer-author debate structures.
