Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features
Hakyung Sung, Karla Csuros, Min-Chang Sung
TL;DR
This study investigates how human and large-language-model (LLM) proofreading affect lexical and syntactic features in second-language writing. Using the ICNALE Edited Essays dataset and three LLMs (GPT-4o, Llama3.1-8b, Deepseek-r1-8b) with a standardized proofreader prompt, the authors quantify changes via 49 lexical and 143 syntactic indices and analyze them with linear mixed-effects models. Results show that both proofreading modalities improve bigram cohesion, but LLM proofreading also boosts lexical diversity and sophistication and induces more extensive syntactic edits, such as adding clauses and nominalizations. Across the LLMs, effects are highly consistent, indicating generalizable patterns of lexical and syntactic augmentation by LLMs; this highlights both benefits for intelligibility and risks related to meaning shifts and authorial voice. The findings inform practical use of LLMs in L2 writing by emphasizing careful application and monitoring of edits to balance fluency gains with fidelity to the writer’s intent.
Abstract
This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.
