Table of Contents
Fetching ...

AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees

Hongyi Zhou, Jin Zhu, Pingfan Su, Kai Ye, Ying Yang, Shakeel A O B Gavioli-Akilagun, Chengchun Shi

TL;DR

This work addresses the challenge of distinguishing human-authored from LLM-generated text by augmenting existing log-probability detectors with an adaptively learned witness function, AdaDetectGPT. It develops a statistically principled statistic based on token-level log-probabilities, leverages martingale central limit theory to set FNR-controlled thresholds, and optimizes a lower bound on TNR to learn the witness function using spline-based features. The approach yields finite-sample guarantees on FNR, TNR, FPR, and TPR, and achieves consistent AUC gains across multiple datasets and target models in both white-box and black-box settings, with notable improvements over Fast-DetectGPT. The work also provides practical guidance on training the witness, analyzes computational efficiency, and releases an open-source Python implementation, contributing a robust, data-driven method that blends statistics with learning for LLM-detection tasks.

Abstract

We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.

AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees

TL;DR

This work addresses the challenge of distinguishing human-authored from LLM-generated text by augmenting existing log-probability detectors with an adaptively learned witness function, AdaDetectGPT. It develops a statistically principled statistic based on token-level log-probabilities, leverages martingale central limit theory to set FNR-controlled thresholds, and optimizes a lower bound on TNR to learn the witness function using spline-based features. The approach yields finite-sample guarantees on FNR, TNR, FPR, and TPR, and achieves consistent AUC gains across multiple datasets and target models in both white-box and black-box settings, with notable improvements over Fast-DetectGPT. The work also provides practical guidance on training the witness, analyzes computational efficiency, and releases an open-source Python implementation, contributing a robust, data-driven method that blends statistics with learning for LLM-detection tasks.

Abstract

We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.

Paper Structure

This paper contains 24 sections, 10 theorems, 75 equations, 8 figures, 11 tables.

Key Result

Theorem 1

Under the equal variance condition specified in Appendix sec:assumptions, $\textrm{TNR}_w$ is asymptotically lower bounded by $\min\{\alpha+\Phi'(z_{\alpha})T_w^{(2*)}, 1-\alpha \}$ where $\Phi'$ is the derivative of $\Phi$ and $T_w^{(2*)}$ denotes a population version of $T_w^{(2)}(\bm{X})$, given

Figures (8)

  • Figure 1: Workflow of AdaDetectGPT. Built upon Fast-DetectGPT bao2024fastdetectgpt, our method adaptively learn a witness function $\widehat{w}$ from training data by maximizing a lower bound on the TNR, while using normal approximation for FNR control.
  • Figure 2: Boxplots visualizing the differences in the statistical measures between human- and LLM-authored passages, comparing AdaDetectGPT (with a learned witness function) and Fast-DetectGPT (without it). A larger positive difference from zero indicates better detection power. As observed, the difference computed by AdaDetectGPT is consistently larger than that of Fast-DetectGPT across the first quartile, median, and third quartile. The left panel shows statistics evaluated on the SQuAD dataset, while the right panel displays results for the WritingPrompts dataset.
  • Figure 3: Top panel: FNR of the classifier plotted against the significance level $\alpha$. Bottom panel: Distribution of statistics evaluated on LLM-generated text. The dashed red line is the density function of standard normal random variable. Results are shown across three datasets (from left to right) and two language models (indicated by different colors).
  • Figure 4: A summary of our theories.
  • Figure S5: Classification accuracy versus the sample size for training $w$. We omit DetectGPT, NPR, and DNA in this experiments as they are time-consuming.
  • ...and 3 more figures

Theorems & Definitions (18)

  • Theorem 1: TNR lower bound
  • Theorem 2: FNR
  • Theorem 3: TNR
  • Corollary 4: TPR
  • Corollary 5: FPR
  • Theorem S6: Bounded differences inequality
  • proof : Proof of Theorem \ref{['theorem: bounded difference']}
  • Theorem S7: Martingale central limit theorem
  • proof
  • Lemma S1: Convergence rates in MCLT
  • ...and 8 more