APRES: An Agentic Paper Revision and Evaluation System

Bingchen Zhao; Jenny Zhang; Chenxi Whitehouse; Minqi Jiang; Michael Shvartsman; Abhishek Charnalia; Despoina Magka; Tatiana Shavrina; Derek Dunfield; Oisin Mac Aodha; Yoram Bachrach

APRES: An Agentic Paper Revision and Evaluation System

Bingchen Zhao, Jenny Zhang, Chenxi Whitehouse, Minqi Jiang, Michael Shvartsman, Abhishek Charnalia, Despoina Magka, Tatiana Shavrina, Derek Dunfield, Oisin Mac Aodha, Yoram Bachrach

TL;DR

A novel method APRES powered by Large Language Models is introduced to update a scientific papers text based on an evaluation rubric, finding a rubric that is highly predictive of future citation counts and integrating it with APRES in an automated system that revises papers to enhance their quality and impact.

Abstract

Scientific discoveries must be communicated clearly to realize their full potential. Without effective communication, even the most groundbreaking findings risk being overlooked or misunderstood. The primary way scientists communicate their work and receive feedback from the community is through peer review. However, the current system often provides inconsistent feedback between reviewers, ultimately hindering the improvement of a manuscript and limiting its potential impact. In this paper, we introduce a novel method APRES powered by Large Language Models (LLMs) to update a scientific papers text based on an evaluation rubric. Our automated method discovers a rubric that is highly predictive of future citation counts, and integrate it with APRES in an automated system that revises papers to enhance their quality and impact. Crucially, this objective should be met without altering the core scientific content. We demonstrate the success of APRES, which improves future citation prediction by 19.6% in mean averaged error over the next best baseline, and show that our paper revision process yields papers that are preferred over the originals by human expert evaluators 79% of the time. Our findings provide strong empirical support for using LLMs as a tool to help authors stress-test their manuscripts before submission. Ultimately, our work seeks to augment, not replace, the essential role of human expert reviewers, for it should be humans who discern which discoveries truly matter, guiding science toward advancing knowledge and enriching lives.

APRES: An Agentic Paper Revision and Evaluation System

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 10 figures, 4 tables)

This paper contains 24 sections, 1 equation, 10 figures, 4 tables.

Introduction
Related Work
Method
Predicting Future Impact
Using LLM Reviews to Improve a Paper's Clarity
Experiments
Impact Prediction
Paper Improvement
Discussion
Conclusion
Effectiveness of LLM reviewers
Glicko2 rating
Results and Analysis.
Reviewer Consistency
Motivation.
...and 9 more sections

Figures (10)

Figure 1: APRES is a two-stage framework that utilizes the same agentic search scaffold (left) for two distinct tasks. The right details the inner-loop logic, executed in two stages, at each node of the search tree. Stage 1 - Rubric Search (top right): an agentic search process discovers a rubric that is highly predictive of a paper's future impact as measured by citation counts. Stage 2 - Paper Improvement (bottom right): the discovered rubric from stage 1 is used as a guide for iteratively revising and enhancing a paper through another agentic search process.
Figure 2: Performance of agentic search (MultiAIDE) in predicting citation counts, measured in Mean Absolute Error (MAE). Our MultiAIDE search approach converges to a lower MAE compared to several strong baselines, including a model using human reviewer scores (Human scores baseline), an MLP on SPECTER paper embeddings, a negative binomial model on the principal components of SPECTER embeddings (Paper embedding + PCA), and an alternative search method (Prompt breeder). The x-axis represents the number of iterations for search methods.
Figure 3: The word cloud generated from participants' reasons for preferring a revised paper.
Figure 4: Improvement in predicted paper quality from our iterative revision process using OpenAI-o3 and Gemini 2.5 Pro LLMs. Our agentic 'Rewriter' improves the scores for both borderline and clear reject papers more. The higher final scores obtained for borderline papers suggests our rubric discovery approach is more effective at fixing presentational flaws as opposed to fundamental scientific ones. The y-axis is averaging of the scores from all rubric items.
Figure A1: Glicko2 ratings from OpenAI-o1.
...and 5 more figures

APRES: An Agentic Paper Revision and Evaluation System

TL;DR

Abstract

APRES: An Agentic Paper Revision and Evaluation System

Authors

TL;DR

Abstract

Table of Contents

Figures (10)