Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals

Nir Chemaya; Daniel Martin

Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals

Nir Chemaya, Daniel Martin

TL;DR

The paper addresses how academics perceive the disclosure of AI use in manuscript preparation and how detectors respond to AI-assisted revisions. It combines a survey of 271 academics with an AI-detection experiment that revises 2,716 Management Science abstracts using GPT-3.5 and evaluates AI-likelihood with Originality.ai. Key findings show that reporting is more common for rewriting than grammar fixing, while detectors often flag grammar fixes as AI-generated; ethics and native-language background shape reporting norms, with notable heterogeneity across respondents. These results inform policy considerations for disclosure requirements and detector enforcement in scholarly publishing and highlight the need for cross-field validation and robust evaluation across multiple detectors and prompts.

Abstract

The emergent abilities of Large Language Models (LLMs), which power tools like ChatGPT and Bard, have produced both excitement and worry about how AI will impact academic writing. In response to rising concerns about AI use, authors of academic publications may decide to voluntarily disclose any AI tools they use to revise their manuscripts, and journals and conferences could begin mandating disclosure and/or turn to using detection services, as many teachers have done with student writing in class settings. Given these looming possibilities, we investigate whether academics view it as necessary to report AI use in manuscript preparation and how detectors react to the use of AI in academic writing.

Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals

TL;DR

Abstract

Paper Structure (12 sections, 11 figures, 6 tables)

This paper contains 12 sections, 11 figures, 6 tables.

Introduction
Methods
Survey Design
Detection Design
Results
Reporting Views and Detection Evaluations
Heterogeneity of Perceptions
Detection Robustness
Discussion
Limitations and Future Directions
Survey Screenshots
Additional Tables

Figures (11)

Figure 1: Reporting views (a) vs. detection results (b).
Figure 2: The distribution of AI scores for the original abstracts (a), abstracts revised using the Grammar 1 prompt (b), and abstracts revised using the Rewrite 1 prompt (c).
Figure 3: Fraction of survey respondents indicating that ChatGPT, RA, Grammarly, and Word use in fixing grammar or ChatGPT, RA, or Proofreading use in rewriting text should be reported, with 95% confidence intervals.
Figure 4: Fraction of survey respondents indicating that ChatGPT use in fixing grammar or rewriting text should be reported, with 95% confidence intervals, by English background, academic role, and perceptions of ethics.
Figure 5: 75th percentile of AI score for original abstracts and the versions that were revised by GPT-3.5 for all of the prompts.
...and 6 more figures

Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals

TL;DR

Abstract

Perceptions and Detection of AI Use in Manuscript Preparation for Academic Journals

Authors

TL;DR

Abstract

Table of Contents

Figures (11)