On the Possibilities of AI-Generated Text Detection
Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
TL;DR
The paper tackles the problem of distinguishing AI-generated text from human text using an information-theoretic lens. It shows that, except when human and machine text distributions are indistinguishable, detection is feasible by collecting multiple samples, with the best performance achieved by likelihood-ratio detectors and AUROC increasing exponentially with sample size via Chernoff information. It provides explicit iid and non-IID sample-complexity bounds, and validates the theory with experiments on multiple datasets and generation/detector pairs. The results support practical multi-sample detectors as a robust tool to mitigate misuse of LLMs, while acknowledging challenges from paraphrasing and distributional proximity. Overall, the work lays a theoretical and empirical foundation for multi-sample AI-generated text detection and informs detector and watermark design for real-world deployment.
Abstract
Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire support. Drawing from information theory, we argue that as machine-generated text approximates human-like quality, the sample size needed for detection increases. We establish precise sample complexity bounds for detecting AI-generated text, laying groundwork for future research aimed at developing advanced, multi-sample detectors. Our empirical evaluations across multiple datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) confirm the viability of enhanced detection methods. We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero. Our findings align with OpenAI's empirical data related to sequence length, marking the first theoretical substantiation for these observations.
