GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints
Soren DeHaan, Yuanze Liu, Johan Bollen, Sa'ul A. Blanco
TL;DR
The paper investigates whether LLMs appear in academic preprints and, if so, whether they are used for editing/translation or full generation. It introduces a hybrid method combining a naive Bayesian classifier over word frequencies using log-odds $LogOdds(W)$ scores and Pruned Exact Linear Time (PELT) changepoint detection to quantify stylistic segmentation along a manuscript. Applying the approach to arXiv data and GPT-3.5 Turbo regenerated text, the study finds that LLM usage is largely uniform and predominantly editing, with partial generation being uncommon; normalization by length removes the apparent link between LLM signals and segmentation. The findings have policy implications, supporting responsible disclosure and the use of LLMs as editing tools in scientific writing while maintaining vigilance against misuse.
Abstract
The proliferation of Large Language Models (LLMs) in late 2022 has impacted academic writing, threatening credibility, and causing institutional uncertainty. We seek to determine the degree to which LLMs are used to generate critical text as opposed to being used for editing, such as checking for grammar errors or inappropriate phrasing. In our study, we analyze arXiv papers for stylistic segmentation, which we measure by varying a PELT threshold against a Bayesian classifier trained on GPT-regenerated text. We find that LLM-attributed language is not predictive of stylistic segmentation, suggesting that when authors use LLMs, they do so uniformly, reducing the risk of hallucinations being introduced into academic preprints.
