Table of Contents
Fetching ...

Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

TL;DR

SEA introduces a three-module automated paper reviewing framework—Standardization (SEA-S), Evaluation (SEA-E), and Analysis (SEA-A)—to address the flood of submissions and inconsistent feedback. SEA-S distills GPT-4's data-standardization capabilities into Mistral-7B to unify multi-review content into a consistent, richly annotated format; SEA-E then trains a long-context LLM to generate comprehensive, evidence-backed reviews from standardized inputs; SEA-A defines a mismatch score and a self-correction mechanism to improve consistency between reviews and paper content. Extensive experiments across eight venues (including NeurIPS and ICLR datasets) show SEA outperforms baselines on content quality, formatting, and alignment with human feedback, with SEA-EA ( SEA-E plus self-correction) achieving the best results. The work presents a practical, scalable approach to automated scientific reviewing, offering constructive feedback to authors and a path toward more consistent, rigorous peer evaluation while acknowledging current limitations and future directions such as domain expansion and rebuttal support.

Abstract

In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing framework SEA. It comprises of three modules: Standardization, Evaluation, and Analysis, which are represented by models SEA-S, SEA-E, and SEA-A, respectively. Initially, SEA-S distills data standardization capabilities of GPT-4 for integrating multiple reviews for a paper. Then, SEA-E utilizes standardized data for fine-tuning, enabling it to generate constructive reviews. Finally, SEA-A introduces a new evaluation metric called mismatch score to assess the consistency between paper contents and reviews. Moreover, we design a self-correction strategy to enhance the consistency. Extensive experimental results on datasets collected from eight venues show that SEA can generate valuable insights for authors to improve their papers.

Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

TL;DR

SEA introduces a three-module automated paper reviewing framework—Standardization (SEA-S), Evaluation (SEA-E), and Analysis (SEA-A)—to address the flood of submissions and inconsistent feedback. SEA-S distills GPT-4's data-standardization capabilities into Mistral-7B to unify multi-review content into a consistent, richly annotated format; SEA-E then trains a long-context LLM to generate comprehensive, evidence-backed reviews from standardized inputs; SEA-A defines a mismatch score and a self-correction mechanism to improve consistency between reviews and paper content. Extensive experiments across eight venues (including NeurIPS and ICLR datasets) show SEA outperforms baselines on content quality, formatting, and alignment with human feedback, with SEA-EA ( SEA-E plus self-correction) achieving the best results. The work presents a practical, scalable approach to automated scientific reviewing, offering constructive feedback to authors and a path toward more consistent, rigorous peer evaluation while acknowledging current limitations and future directions such as domain expansion and rebuttal support.

Abstract

In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing framework SEA. It comprises of three modules: Standardization, Evaluation, and Analysis, which are represented by models SEA-S, SEA-E, and SEA-A, respectively. Initially, SEA-S distills data standardization capabilities of GPT-4 for integrating multiple reviews for a paper. Then, SEA-E utilizes standardized data for fine-tuning, enabling it to generate constructive reviews. Finally, SEA-A introduces a new evaluation metric called mismatch score to assess the consistency between paper contents and reviews. Moreover, we design a self-correction strategy to enhance the consistency. Extensive experimental results on datasets collected from eight venues show that SEA can generate valuable insights for authors to improve their papers.
Paper Structure (41 sections, 3 equations, 9 figures, 11 tables)

This paper contains 41 sections, 3 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Multiple reviews of a paper often provide helpful but partial opinions on certain aspects. Integrating these reviews can offer more comprehensive feedback on the paper.
  • Figure 2: The overall framework of SEA consists of three modules: Standardization, Evaluation and Analysis.
  • Figure 3: Content analysis results.
  • Figure 4: Format analysis of different models.
  • Figure 5: The performance of different models on mismatch scores across various datasets.
  • ...and 4 more figures