Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

Qian Ruan; Iryna Gurevych

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

Qian Ruan, Iryna Gurevych

TL;DR

This work reframes author response generation as an author-in-the-loop task and introduces three core contributions: REspGen, a modular ARG framework that incorporates explicit author input, controllable planning and length, and evaluation-guided refinement; Re$^3$Align, the first large-scale dataset of aligned review–response–revision signals; and REspEval, a comprehensive evaluation suite with 20+ metrics spanning controllability, input utilization, discourse, and response quality. Across five SOTA LLMs and nine generation settings, the authors demonstrate that author input and evaluation-guided refinement improve response quality and alignment with reviewer concerns, while revealing trade-offs between richer author context and focus on core improvements, and between single- versus multi-attribute controllability. The dataset, generation framework, and evaluation tools provide a foundation for future NLP research on publishable, author-aligned rebuttal writing and broader human–AI collaboration in scholarly communication. The findings underscore the value of explicitly modeling author expertise and intent to produce more concrete, persuasive, and trustworthy ARG outputs while preserving essential human involvement in peer review.

Abstract

Author response (rebuttal) writing is a critical stage of scientific peer review that demands substantial author effort. Recent work frames this task as automatic text generation, underusing author expertise and intent. In practice, authors possess domain expertise, author-only information, revision and response strategies--concrete forms of author expertise and intent--to address reviewer concerns, and seek NLP assistance that integrates these signals to support effective response writing in peer review. We reformulate author response generation as an author-in-the-loop task and introduce REspGen, a generation framework that integrates explicit author input, multi-attribute control, and evaluation-guided refinement, together with REspEval, a comprehensive evaluation suite with 20+ metrics covering input utilization, controllability, response quality, and discourse. To support this formulation, we construct Re$^3$Align, the first large-scale dataset of aligned review--response--revision triplets, where revisions provide signals of author expertise and intent. Experiments with state-of-the-art LLMs show the benefits of author input and evaluation-guided refinement, the impact of input design on response quality, and trade-offs between controllability and quality. We make our dataset, generation and evaluation tools publicly available.

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

TL;DR

Align, the first large-scale dataset of aligned review–response–revision signals; and REspEval, a comprehensive evaluation suite with 20+ metrics spanning controllability, input utilization, discourse, and response quality. Across five SOTA LLMs and nine generation settings, the authors demonstrate that author input and evaluation-guided refinement improve response quality and alignment with reviewer concerns, while revealing trade-offs between richer author context and focus on core improvements, and between single- versus multi-attribute controllability. The dataset, generation framework, and evaluation tools provide a foundation for future NLP research on publishable, author-aligned rebuttal writing and broader human–AI collaboration in scholarly communication. The findings underscore the value of explicitly modeling author expertise and intent to produce more concrete, persuasive, and trustworthy ARG outputs while preserving essential human involvement in peer review.

Abstract

Align, the first large-scale dataset of aligned review--response--revision triplets, where revisions provide signals of author expertise and intent. Experiments with state-of-the-art LLMs show the benefits of author input and evaluation-guided refinement, the impact of input design on response quality, and trade-offs between controllability and quality. We make our dataset, generation and evaluation tools publicly available.

Paper Structure (40 sections, 1 equation, 13 figures, 14 tables)

This paper contains 40 sections, 1 equation, 13 figures, 14 tables.

Introduction
Related Work
Dataset Construction: Re$^3$Align
Data Collection and Preprocessing
Review-Response Pair Alignment and Revision Annotation
Re$^3$ Triplet Alignment
Generation Framework: REspGen
Response Attribute Control
Input Component Configuration
Evaluation-guided Refinement
Evaluation Framework: REspEval
Response Discourse Analysis
Controllability Evaluation
Input Utilization Measures
Response Quality Evaluation
...and 25 more sections

Figures (13)

Figure 1: In this work, we contribute (1) REspGen, an author-in-the-loop ARG framework that integrates explicit author input (d), controllable planning and length (b–c), and additional paper context (e); (2) Re$^3$Align, the first large-scale review–response–revision triplets dataset for modeling author signals; and (3) REspEval, a comprehensive response evaluation framework with over 20 metrics spanning four dimensions.
Figure 2: Frameworks: REspGen & REspEval.
Figure 3: Changes in Specificity after refinement across five LLMs. Colors indicate increase (green), no change (yellow), or decrease (red); the first bar shows overall proportions, followed by distributions by initial score.
Figure 4: An illustrative example of segment-level review-response pair matching.
Figure 5: Optimized prompt to itemize review segments and label response actions.
...and 8 more figures

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

TL;DR

Abstract

Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review

Authors

TL;DR

Abstract

Table of Contents

Figures (13)