Table of Contents
Fetching ...

EditLens: Quantifying the Extent of AI Editing in Text

Katherine Thai, Bradley Emi, Elyas Masrour, Mohit Iyyer

TL;DR

EditLens addresses the need to quantify AI involvement in text editing on a continuous scale rather than a binary label. By modeling homogeneous mixed text with y = 𝓔_λ(x; z) and predicting a change magnitude Δ(x,y) via similarity-based targets, the authors train a regression head that maps edited text y to a score reflecting AI editing extent. They construct a large homogeneous mixed-text dataset with synthetic AI edits, derive two intermediate supervision metrics (cosine similarity of Linq-Embed-Mistral and soft n-grams), and validate these against human judgments, then fine-tune a Mistral/Llama backbone with QLoRA. EditLens achieves state-of-the-art performance on binary and ternary detection tasks, generalizes to unseen prompts, domains, and human-edited AI text, and provides valuable case studies (Grammarly, BEEMO). The approach enables flexible policy, reduces false positives, and contributes a publicly available dataset and models to spur further research in measured AI usage in writing.

Abstract

A significant proportion of queries to large language models ask them to edit user-provided text, rather than generate new text from scratch. While previous work focuses on detecting fully AI-generated text, we demonstrate that AI-edited text is distinguishable from human-written and AI-generated text. First, we propose using lightweight similarity metrics to quantify the magnitude of AI editing present in a text given the original human-written text and validate these metrics with human annotators. Using these similarity metrics as intermediate supervision, we then train EditLens, a regression model that predicts the amount of AI editing present within a text. Our model achieves state-of-the-art performance on both binary (F1=94.7%) and ternary (F1=90.4%) classification tasks in distinguishing human, AI, and mixed writing. Not only do we show that AI-edited text can be detected, but also that the degree of change made by AI to human writing can be detected, which has implications for authorship attribution, education, and policy. Finally, as a case study, we use our model to analyze the effects of AI-edits applied by Grammarly, a popular writing assistance tool. To encourage further research, we commit to publicly releasing our models and dataset.

EditLens: Quantifying the Extent of AI Editing in Text

TL;DR

EditLens addresses the need to quantify AI involvement in text editing on a continuous scale rather than a binary label. By modeling homogeneous mixed text with y = 𝓔_λ(x; z) and predicting a change magnitude Δ(x,y) via similarity-based targets, the authors train a regression head that maps edited text y to a score reflecting AI editing extent. They construct a large homogeneous mixed-text dataset with synthetic AI edits, derive two intermediate supervision metrics (cosine similarity of Linq-Embed-Mistral and soft n-grams), and validate these against human judgments, then fine-tune a Mistral/Llama backbone with QLoRA. EditLens achieves state-of-the-art performance on binary and ternary detection tasks, generalizes to unseen prompts, domains, and human-edited AI text, and provides valuable case studies (Grammarly, BEEMO). The approach enables flexible policy, reduces false positives, and contributes a publicly available dataset and models to spur further research in measured AI usage in writing.

Abstract

A significant proportion of queries to large language models ask them to edit user-provided text, rather than generate new text from scratch. While previous work focuses on detecting fully AI-generated text, we demonstrate that AI-edited text is distinguishable from human-written and AI-generated text. First, we propose using lightweight similarity metrics to quantify the magnitude of AI editing present in a text given the original human-written text and validate these metrics with human annotators. Using these similarity metrics as intermediate supervision, we then train EditLens, a regression model that predicts the amount of AI editing present within a text. Our model achieves state-of-the-art performance on both binary (F1=94.7%) and ternary (F1=90.4%) classification tasks in distinguishing human, AI, and mixed writing. Not only do we show that AI-edited text can be detected, but also that the degree of change made by AI to human writing can be detected, which has implications for authorship attribution, education, and policy. Finally, as a case study, we use our model to analyze the effects of AI-edits applied by Grammarly, a popular writing assistance tool. To encourage further research, we commit to publicly releasing our models and dataset.

Paper Structure

This paper contains 50 sections, 11 equations, 11 figures, 17 tables.

Figures (11)

  • Figure 1: AI edits exist on a continuous spectrum from fully human written to fully AI generated. Here we show three versions of the same human-written text after different edits have been applied by an LLM alongside the cosine distance between the edited text and the fully human text. Texts have been truncated for space. "Fix any mistakes," the most mild edit according to cosine distance, results in a text with only spelling and grammar errors corrected, while "Make it more descriptive" closely adheres to the ideas in the human-written text while substantially rewriting it.
  • Figure 2: Examples of heterogeneous and homogeneous mixed authorship texts. In heterogeneous mixed text, authorship of each token is clearly attributable. But in homogeneous mixed text, the human-originated ideas are clearly present in each rewritten sentence by the model, making it impossible to assign binary labels of authorship to any word or sentence.
  • Figure 3: EditLens architecture. We generate fully AI and AI-edited versions of human source texts, then use lightweight similarity metrics as intermediate supervision. We partition the texts into $n$ buckets according to supervised score and experiment with training both a regression model and $n$-way classification models, then using weight-average decoding to obtain a numerical score.
  • Figure 4: Distributions for EditLens and Pangram on the AI Polish dataset saha2025almost. Pangram overwhelmingly tends to predict a score of either 0 or 1, while EditLens captures the increasing levels of AI polish applied to the texts.
  • Figure 5: "Trajectory" of EditLens scores after subsequent AI edits to a single text. We can observe that the mean score predicted by EditLens after each edit is monotonically increasing.
  • ...and 6 more figures