Automated Focused Feedback Generation for Scientific Writing Assistance
Eric Chamoun, Michael Schlichktrull, Andreas Vlachos
TL;DR
The paper tackles the gap that existing writing-assistance tools address surface-level clarity rather than manuscript content by introducing automated focused feedback generation for scientific writing. It proposes SWIF$^{2}$T, a four-agent system (planner, investigator, reviewer, controller) that gathers context from the target paragraph, the rest of the paper, and related literature to produce specific, actionable feedback, with a plan re-ranking module to optimize plan quality. The authors assemble a substantial dataset of 2,581 paragraph–review pairs from multiple sources and demonstrate, through both human and automatic evaluations, that SWIF$^{2}$T yields higher specificity, better reading comprehension integration, and greater overall usefulness than strong baselines such as GPT-4 and CoVe, including instances where AI-generated feedback outperforms human feedback. They also discuss limitations related to literature retrieval, potential hallucinations, and the high computational cost, outlining directions for future work to integrate AI-assisted feedback more broadly into scientific writing workflows and tools.
Abstract
Scientific writing is a challenging task, particularly for novice researchers who often rely on feedback from experienced peers. Recent work has primarily focused on improving surface form and style rather than manuscript content. In this paper, we propose a novel task: automated focused feedback generation for scientific writing assistance. We present SWIF$^{2}$T: a Scientific WrIting Focused Feedback Tool. It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it. Our approach consists of four components - planner, investigator, reviewer and controller - leveraging multiple Large Language Models (LLMs) to implement them. We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation. The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF$^{2}$T's feedback compared to other approaches. In our analysis, we also identified cases where automatically generated reviews were judged better than human ones, suggesting opportunities for integration of AI-generated feedback in scientific writing.
