AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
Hongyi Zhou, Jin Zhu, Pingfan Su, Kai Ye, Ying Yang, Shakeel A O B Gavioli-Akilagun, Chengchun Shi
TL;DR
This work addresses the challenge of distinguishing human-authored from LLM-generated text by augmenting existing log-probability detectors with an adaptively learned witness function, AdaDetectGPT. It develops a statistically principled statistic based on token-level log-probabilities, leverages martingale central limit theory to set FNR-controlled thresholds, and optimizes a lower bound on TNR to learn the witness function using spline-based features. The approach yields finite-sample guarantees on FNR, TNR, FPR, and TPR, and achieves consistent AUC gains across multiple datasets and target models in both white-box and black-box settings, with notable improvements over Fast-DetectGPT. The work also provides practical guidance on training the witness, analyzes computational efficiency, and releases an open-source Python implementation, contributing a robust, data-driven method that blends statistics with learning for LLM-detection tasks.
Abstract
We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.
