Table of Contents
Fetching ...

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

Cheng Tan, Dongxin Lyu, Siyuan Li, Zhangyang Gao, Jingxuan Wei, Siqi Ma, Zicheng Liu, Stan Z. Li

TL;DR

This work reframes academic peer review as a long-context, multi-turn dialogue among authors, reviewers, and decision makers, and introduces ReviewMT—a dataset built from ICLR and Nature Communications—to simulate complete, iterative peer-review interactions. It formalizes the triadic workflow with explicit turns: initial reviews, author rebuttals, final reviews, and meta decisions, and pioneers a suite of evaluation metrics tailored to each role. Through experiments on multiple open-source LLMs, the study shows that supervised fine-tuning significantly improves performance across validity and quality metrics, highlighting the value of role-based dialogue for scalable, fair peer review. The dataset and framework offer a robust foundation for future research on LLM-assisted peer review and open avenues for improving efficiency and transparency in scholarly publishing.

Abstract

Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

TL;DR

This work reframes academic peer review as a long-context, multi-turn dialogue among authors, reviewers, and decision makers, and introduces ReviewMT—a dataset built from ICLR and Nature Communications—to simulate complete, iterative peer-review interactions. It formalizes the triadic workflow with explicit turns: initial reviews, author rebuttals, final reviews, and meta decisions, and pioneers a suite of evaluation metrics tailored to each role. Through experiments on multiple open-source LLMs, the study shows that supervised fine-tuning significantly improves performance across validity and quality metrics, highlighting the value of role-based dialogue for scalable, fair peer review. The dataset and framework offer a robust foundation for future research on LLM-assisted peer review and open avenues for improving efficiency and transparency in scholarly publishing.

Abstract

Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.
Paper Structure (12 sections, 2 equations, 6 figures, 3 tables)

This paper contains 12 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Comparison of existing LLM applications in peer review and our reformulated framework.
  • Figure 2: Overview of the data processing pipeline for the ReviewMT dataset.
  • Figure 3: Statistics of the ICLR papers and reviews in the ReviewMT-ICLR dataset.
  • Figure 4: The word cloud of the keywords in the ReviewMT dataset.
  • Figure 5: The radar chart of text similarity metrics for LLMs on the ReviewMT-ICLR dataset.
  • ...and 1 more figures