Automated Focused Feedback Generation for Scientific Writing Assistance

Eric Chamoun; Michael Schlichktrull; Andreas Vlachos

Automated Focused Feedback Generation for Scientific Writing Assistance

Eric Chamoun, Michael Schlichktrull, Andreas Vlachos

TL;DR

The paper tackles the gap that existing writing-assistance tools address surface-level clarity rather than manuscript content by introducing automated focused feedback generation for scientific writing. It proposes SWIF$^{2}$T, a four-agent system (planner, investigator, reviewer, controller) that gathers context from the target paragraph, the rest of the paper, and related literature to produce specific, actionable feedback, with a plan re-ranking module to optimize plan quality. The authors assemble a substantial dataset of 2,581 paragraph–review pairs from multiple sources and demonstrate, through both human and automatic evaluations, that SWIF$^{2}$T yields higher specificity, better reading comprehension integration, and greater overall usefulness than strong baselines such as GPT-4 and CoVe, including instances where AI-generated feedback outperforms human feedback. They also discuss limitations related to literature retrieval, potential hallucinations, and the high computational cost, outlining directions for future work to integrate AI-assisted feedback more broadly into scientific writing workflows and tools.

Abstract

Scientific writing is a challenging task, particularly for novice researchers who often rely on feedback from experienced peers. Recent work has primarily focused on improving surface form and style rather than manuscript content. In this paper, we propose a novel task: automated focused feedback generation for scientific writing assistance. We present SWIF$^{2}$T: a Scientific WrIting Focused Feedback Tool. It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it. Our approach consists of four components - planner, investigator, reviewer and controller - leveraging multiple Large Language Models (LLMs) to implement them. We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation. The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF$^{2}$T's feedback compared to other approaches. In our analysis, we also identified cases where automatically generated reviews were judged better than human ones, suggesting opportunities for integration of AI-generated feedback in scientific writing.

Automated Focused Feedback Generation for Scientific Writing Assistance

TL;DR

T, a four-agent system (planner, investigator, reviewer, controller) that gathers context from the target paragraph, the rest of the paper, and related literature to produce specific, actionable feedback, with a plan re-ranking module to optimize plan quality. The authors assemble a substantial dataset of 2,581 paragraph–review pairs from multiple sources and demonstrate, through both human and automatic evaluations, that SWIF

T yields higher specificity, better reading comprehension integration, and greater overall usefulness than strong baselines such as GPT-4 and CoVe, including instances where AI-generated feedback outperforms human feedback. They also discuss limitations related to literature retrieval, potential hallucinations, and the high computational cost, outlining directions for future work to integrate AI-assisted feedback more broadly into scientific writing workflows and tools.

Abstract

T: a Scientific WrIting Focused Feedback Tool. It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it. Our approach consists of four components - planner, investigator, reviewer and controller - leveraging multiple Large Language Models (LLMs) to implement them. We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation. The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF

T's feedback compared to other approaches. In our analysis, we also identified cases where automatically generated reviews were judged better than human ones, suggesting opportunities for integration of AI-generated feedback in scientific writing.

Paper Structure (46 sections, 1 equation, 8 figures, 10 tables)

This paper contains 46 sections, 1 equation, 8 figures, 10 tables.

Introduction
Related work
Scientific reviewing
Text revision
Scientific paper-review discourse analysis
Focused feedback generation
SWIF$^{2}$T
Description
Plan re-ranking
Implementation
Dataset compilation
Modelling details
Plan re-ranking
Investigator
Reviewer
...and 31 more sections

Figures (8)

Figure 1: Overview of the proposed task.
Figure 2: Illustration of our approach.
Figure 3: Example of a plan generated by SWIF$^{2}$T . We include an example of an entire run in Appendix \ref{['app:run-example']}.
Figure 4: Each comparison involves a paragraph and two reviews. Reviews are randomized at the start of the annotation and no information about the source of the review is provided to prevent bias.
Figure 5: A link is provided to the annotator for convenient access to the version of the paper with the paragraph under review.
...and 3 more figures

Automated Focused Feedback Generation for Scientific Writing Assistance

TL;DR

Abstract

Automated Focused Feedback Generation for Scientific Writing Assistance

Authors

TL;DR

Abstract

Table of Contents

Figures (8)