DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang; Wei Cheng; Yue Wu; Linda Petzold; William Yang Wang; Haifeng Chen

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang, Wei Cheng, Yue Wu, Linda Petzold, William Yang Wang, Haifeng Chen

TL;DR

DNA-GPT introduces Divergent N-Gram Analysis, a training-free detector that exploits differences in continuation distributions conditioned on preceding text to distinguish GPT-generated from human-written text. It formalizes the Likelihood-Gap Hypothesis and implements two detectors: Black-box via BScore with n-gram overlaps, and White-box via WScore using token-probability ratios, plus evidence En for explainability. Across five diverse datasets and multiple models, DNA-GPT achieves state-of-the-art zero-shot detection, including non-English German text, and demonstrates robustness to text revisions and model updates, with the added capability of model sourcing. The work provides practical, explainable detection suitable for deployment and offers code for reproducibility and further research.

Abstract

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies between the distribution of machine-generated text and the distribution of human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

TL;DR

Abstract

Paper Structure (29 sections, 11 equations, 8 figures, 16 tables)

This paper contains 29 sections, 11 equations, 8 figures, 16 tables.

Introduction
Related Work
Methodology
Likelihood-Gap Hypothesis.
Black-box Detection
White-box Detection
Evidence
Experiments
Experimental Setup
Results and Analysis
Conclusion
Theoretical Analysis
Is it always possible to distinguish between AI-generated text and Human?
Principled Choice of $K$
Additional Experimental Results
...and 14 more sections

Figures (8)

Figure 1: Overview of our framework. Given a candidate passage x, we aim to distinguish whether it is generated by a certain language model like GPT-3.5-turbo or human. Our method first truncates the original passage by a ratio to obtain the truncated text $x'$ and remaining text $y_0$, then $x'$ is fed into the language model for generating K new outputs $\{y_1, ..., y_K \}$. Finally, a BScore or WScore between the new outputs and $y_0$ is calculated for classifying original candidate x into human or AI-generated content. The threshold $\epsilon$ balances TPR and FPR. This example is taken from the PubMedQA dataset.
Figure 2: Difference on text-davinci-003 generation on Reddit prompts.
Figure 3: The impact of truncation ratio.
Figure 4: A comparative analysis of AUROC scores and TPR (at a 1% FPR) across four datasets, each characterized by different numbers of regeneration. The analysis is performed under both black-box and white-box settings, utilizing the gpt-3.5-turbo and text-davinci-003 models, respectively.
Figure 5: The impact of decoding temperature on detection performance, conducted using gpt-3.5-turbo.
...and 3 more figures

Theorems & Definitions (1)

proof

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

TL;DR

Abstract

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (1)