Table of Contents
Fetching ...

Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing

Sanchaita Hazra, Doeun Lee, Bodhisattwa Prasad Majumder, Sachin Kumar

TL;DR

This study conducts a rigorous, incentivized randomized controlled trial to evaluate how the origin of an abstract (AI-generated vs human-written) and the disclosure of that origin influence author editing behavior and review outcomes. The authors collect fine-grained keystroke-level edits, compare edits across provenance and disclosure conditions, and analyze reviewer accept/reject decisions in a simulated conference setting. Key findings include that authors edit AI-generated abstracts less when provenance is undisclosed, but disclosure can trigger social and structural edits that modestly improve acceptance; reviewer decisions remain largely unaffected by provenance alone. The work combines behavioral economics, linguistic style analysis, and qualitative interviews to reveal how AI provenance and transparency shape scientific writing practices and editorial accountability, offering a framework for evaluating AI-assisted writing in scholarly communication. It highlights the potential of AI-generated abstracts to reach comparable acceptance with careful editing, while underscoring the importance of disclosure and authors' attitudes toward AI in shaping editing behavior and outcomes.

Abstract

Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing -- an endeavor requiring precision, multimodal synthesis, and domain expertise -- remains insufficiently understood. We examine the potential of LLMs to support domain experts in scientific writing, with a focus on abstract composition. We design an incentivized randomized controlled trial with a hypothetical conference setup where participants with relevant expertise are split into an author and reviewer pool. Inspired by methods in behavioral science, our novel incentive structure encourages authors to edit the provided abstracts to an acceptable quality for a peer-reviewed submission. Our 2x2 between-subject design expands into two dimensions: the implicit source of the provided abstract and the disclosure of it. We find authors make most edits when editing human-written abstracts compared to AI-generated abstracts without source attribution, often guided by higher perceived readability in AI generation. Upon disclosure of source information, the volume of edits converges in both source treatments. Reviewer decisions remain unaffected by the source of the abstract, but bear a significant correlation with the number of edits made. Careful stylistic edits, especially in the case of AI-generated abstracts, in the presence of source information, improve the chance of acceptance. We find that AI-generated abstracts hold potential to reach comparable levels of acceptability to human-written ones with minimal revision, and that perceptions of AI authorship, rather than objective quality, drive much of the observed editing behavior. Our findings reverberate the significance of source disclosure in collaborative scientific writing.

Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing

TL;DR

This study conducts a rigorous, incentivized randomized controlled trial to evaluate how the origin of an abstract (AI-generated vs human-written) and the disclosure of that origin influence author editing behavior and review outcomes. The authors collect fine-grained keystroke-level edits, compare edits across provenance and disclosure conditions, and analyze reviewer accept/reject decisions in a simulated conference setting. Key findings include that authors edit AI-generated abstracts less when provenance is undisclosed, but disclosure can trigger social and structural edits that modestly improve acceptance; reviewer decisions remain largely unaffected by provenance alone. The work combines behavioral economics, linguistic style analysis, and qualitative interviews to reveal how AI provenance and transparency shape scientific writing practices and editorial accountability, offering a framework for evaluating AI-assisted writing in scholarly communication. It highlights the potential of AI-generated abstracts to reach comparable acceptance with careful editing, while underscoring the importance of disclosure and authors' attitudes toward AI in shaping editing behavior and outcomes.

Abstract

Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing -- an endeavor requiring precision, multimodal synthesis, and domain expertise -- remains insufficiently understood. We examine the potential of LLMs to support domain experts in scientific writing, with a focus on abstract composition. We design an incentivized randomized controlled trial with a hypothetical conference setup where participants with relevant expertise are split into an author and reviewer pool. Inspired by methods in behavioral science, our novel incentive structure encourages authors to edit the provided abstracts to an acceptable quality for a peer-reviewed submission. Our 2x2 between-subject design expands into two dimensions: the implicit source of the provided abstract and the disclosure of it. We find authors make most edits when editing human-written abstracts compared to AI-generated abstracts without source attribution, often guided by higher perceived readability in AI generation. Upon disclosure of source information, the volume of edits converges in both source treatments. Reviewer decisions remain unaffected by the source of the abstract, but bear a significant correlation with the number of edits made. Careful stylistic edits, especially in the case of AI-generated abstracts, in the presence of source information, improve the chance of acceptance. We find that AI-generated abstracts hold potential to reach comparable levels of acceptability to human-written ones with minimal revision, and that perceptions of AI authorship, rather than objective quality, drive much of the observed editing behavior. Our findings reverberate the significance of source disclosure in collaborative scientific writing.

Paper Structure

This paper contains 47 sections, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Curious case of scientific writing: We find that expert authors (e.g., with PhD) make most edits when provided with human-written counterparts of AI-generated abstracts, especially so when the source of abstracts remains unattributed. With attribution, we see an opposite trend: authors made careful stylistic edits when the abstract was known to be AI-generated, which often raises the chance of getting the abstract accepted.
  • Figure 2: Experimental workflow: Authors from a shared pool were randomly assigned to one of four treatments varying abstract source (AI-generated vs. human-written) and information disclosure (with vs. without source information). Authors revised abstracts in the editing interface. Each edited abstract was randomly assigned to three reviewers for their individual verdict. Reviewers see the final edited version of abstracts without knowing what edits were made. A majority vote decides the final verdict.
  • Figure 3: An example of the first detailed pictorial representation where experimenters show the authors the author panel and provide them explicit information that edits need to be made directly in the interface. We conducted several pilot studies where we found that the recruited authors were copy-pasting the provided abstract and making edits elsewhere. Providing these graphic instructions and additional screening questions helped mitigate this problem and enabled capturing keystroke-level edits for every abstract.
  • Figure 4: Illustration of prohibited actions during the abstract editing task. Authors in the second schematic illustration are instructed not to copy the provided abstract into external tools (e.g., text authors, AI assistants), make edits outside the designated experimental interface, or paste modified content back into the system. We inform authors that such actions violate the experimental protocol and may lead to exclusion from the study.
  • Figure 5: Patterns of editing effort in Study 1 (N = 495 abstracts). Left: Distribution of Levenshtein edit distance in two conditions: Human-noInfo (blue) and AI-noInfo (red). Black squares mark the sample means ± 95 % CI; the Welch unequal-variance test indicates a statistically reliable reduction in edits for AI abstracts (p = 0.0293). Middle: Mean edit distance broken down by the authors' highest education level. Authors with undergraduate or graduate degrees make markedly larger edits to human text than to AI text, whereas doctorate-level editors show the opposite. Right: Scatter-plot of abstract’s perceived readability (0–100) vs the author’s confidence in their edits (0–100). The positive slopes show that authors feel more confident when an abstract reads more smoothly; readability, not AI, as the source explains authors’ higher confidence in the AI-noInfo condition.
  • ...and 12 more figures