Table of Contents
Fetching ...

Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

TL;DR

This work introduces GECScore, a black-box zero-shot detector for LLM-generated text that leverages grammar error correction to distinguish human-written from machine-generated content without access to the source model or large training data. By correcting input text with a GEC model and measuring similarity between the original and corrected versions, GECScore identifies higher similarity when texts are LLM-generated, due to their tendency to be grammatically cleaner and more consistent in correction preferences. Extensive experiments on XSum and Writing Prompts show state-of-the-art AUROC (around 98.6% on average) and strong generalization across domains, models, and paraphrase attacks, outperforming both zero-shot and supervised baselines. The method demonstrates robust reliability in the wild, offering a practical, efficient solution for real-world LLM-detection tasks and providing data/code access for replication."

Abstract

The efficacy of detectors for texts generated by large language models (LLMs) substantially depends on the availability of large-scale training data. However, white-box zero-shot detectors, which require no such data, are limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose a simple yet effective black-box zero-shot detection approach based on the observation that, from the perspective of LLMs, human-written texts typically contain more grammatical errors than LLM-generated texts. This approach involves calculating the Grammar Error Correction Score (GECScore) for the given text to differentiate between human-written and LLM-generated text. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.62% across XSum and Writing Prompts dataset. Additionally, our approach demonstrates strong reliability in the wild, exhibiting robust generalization and resistance to paraphrasing attacks. Data and code are available at: https://github.com/NLP2CT/GECScore.

Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

TL;DR

This work introduces GECScore, a black-box zero-shot detector for LLM-generated text that leverages grammar error correction to distinguish human-written from machine-generated content without access to the source model or large training data. By correcting input text with a GEC model and measuring similarity between the original and corrected versions, GECScore identifies higher similarity when texts are LLM-generated, due to their tendency to be grammatically cleaner and more consistent in correction preferences. Extensive experiments on XSum and Writing Prompts show state-of-the-art AUROC (around 98.6% on average) and strong generalization across domains, models, and paraphrase attacks, outperforming both zero-shot and supervised baselines. The method demonstrates robust reliability in the wild, offering a practical, efficient solution for real-world LLM-detection tasks and providing data/code access for replication."

Abstract

The efficacy of detectors for texts generated by large language models (LLMs) substantially depends on the availability of large-scale training data. However, white-box zero-shot detectors, which require no such data, are limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose a simple yet effective black-box zero-shot detection approach based on the observation that, from the perspective of LLMs, human-written texts typically contain more grammatical errors than LLM-generated texts. This approach involves calculating the Grammar Error Correction Score (GECScore) for the given text to differentiate between human-written and LLM-generated text. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.62% across XSum and Writing Prompts dataset. Additionally, our approach demonstrates strong reliability in the wild, exhibiting robust generalization and resistance to paraphrasing attacks. Data and code are available at: https://github.com/NLP2CT/GECScore.
Paper Structure (71 sections, 5 figures, 6 tables, 1 algorithm)

This paper contains 71 sections, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Distribution of grammar errors of human-written texts and LLM-generated texts by GPT-3.5-turbo, PaLM-2-bison, Claude-3.5-Sonnet, and Llama-3-70B on XSum and Writing Prompts. GPT-4o blog2024gpt4o is employed for grammar errors marking.
  • Figure 2: GECScore Framework Overview. First, a grammar correction model generates a grammatically corrected version $\tilde{x_i}$ of the input text $x_i$. Next, the similarity score $s_{i}$ between $\tilde{x_i}$ and $x_i$ is calculated using a similarity metric $Sim$. Finally, if $s_{i}$ meets or exceeds the threshold $\epsilon$, the text is more likely to be generated by LLM.
  • Figure 3: Distribution of types of editing operations required for grammar error corrections on human-written texts and LLM-generated texts by GPT-3.5-turbo, PaLM-2-bison, Claude-3.5-Sonnet, and Llama-3-70B on XSum and Writing Prompts.
  • Figure 4: Impact of Different GEC Models and Scoring Metrics Performance on Different Text Sizes. The plot displays the AUROC across varying text sizes. The x-axis represents the number of words of the text, while the y-axis indicates the corresponding detection performance of GECScore with different settings.
  • Figure 5: Effect of the Balanced Sample Set Size n.