Table of Contents
Fetching ...

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Cong Zeng, Shengkun Tang, Yuanzhou Chen, Zhiqiang Shen, Wenchao Yu, Xujiang Zhao, Haifeng Chen, Wei Cheng, Zhiqiang Xu

TL;DR

A detection framework is developed using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance and validating the effectiveness of the OOD-based approach.

Abstract

The rapid advancement of large language models (LLMs) such as ChatGPT, DeepSeek, and Claude has significantly increased the presence of AI-generated text in digital communication. This trend has heightened the need for reliable detection methods to distinguish between human-authored and machine-generated content. Existing approaches both zero-shot methods and supervised classifiers largely conceptualize this task as a binary classification problem, often leading to poor generalization across domains and models. In this paper, we argue that such a binary formulation fundamentally mischaracterizes the detection task by assuming a coherent representation of human-written texts. In reality, human texts do not constitute a unified distribution, and their diversity cannot be effectively captured through limited sampling. This causes previous classifiers to memorize observed OOD characteristics rather than learn the essence of `non-ID' behavior, limiting generalization to unseen human-authored inputs. Based on this observation, we propose reframing the detection task as an out-of-distribution (OOD) detection problem, treating human-written texts as distributional outliers while machine-generated texts are in-distribution (ID) samples. To this end, we develop a detection framework using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance. Extensive experiments across multiple datasets validate the effectiveness of our OOD-based approach. Specifically, the OOD-based method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset. Moreover, we test our detection framework on multilingual, attacked, and unseen-model and -domain text settings, demonstrating the robustness and generalizability of our framework. Code, pretrained weights, and demo will be released.

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

TL;DR

A detection framework is developed using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance and validating the effectiveness of the OOD-based approach.

Abstract

The rapid advancement of large language models (LLMs) such as ChatGPT, DeepSeek, and Claude has significantly increased the presence of AI-generated text in digital communication. This trend has heightened the need for reliable detection methods to distinguish between human-authored and machine-generated content. Existing approaches both zero-shot methods and supervised classifiers largely conceptualize this task as a binary classification problem, often leading to poor generalization across domains and models. In this paper, we argue that such a binary formulation fundamentally mischaracterizes the detection task by assuming a coherent representation of human-written texts. In reality, human texts do not constitute a unified distribution, and their diversity cannot be effectively captured through limited sampling. This causes previous classifiers to memorize observed OOD characteristics rather than learn the essence of `non-ID' behavior, limiting generalization to unseen human-authored inputs. Based on this observation, we propose reframing the detection task as an out-of-distribution (OOD) detection problem, treating human-written texts as distributional outliers while machine-generated texts are in-distribution (ID) samples. To this end, we develop a detection framework using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance. Extensive experiments across multiple datasets validate the effectiveness of our OOD-based approach. Specifically, the OOD-based method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset. Moreover, we test our detection framework on multilingual, attacked, and unseen-model and -domain text settings, demonstrating the robustness and generalizability of our framework. Code, pretrained weights, and demo will be released.

Paper Structure

This paper contains 33 sections, 2 theorems, 28 equations, 3 figures, 11 tables.

Key Result

Theorem 2

Consider training distribution $\mathcal{D} = (q_M, q_H, P_{M}, P_{H})$ and real-world distribution $\widehat{\mathcal{D}} = (q_M, q_H, \widehat{P}_{M}, \widehat{P}_{H})$ both consistent with the ground truth probability $\widehat{p}_M(\cdot)$. If the Pearson $\chi^2$ divergence $D := D_{\chi^2} (P_

Figures (3)

  • Figure 1: Decision boundaries and model performance comparison. Left: Decision boundaries under distributional asymmetry. The binary classifier separates machine-generated text from limited human-written samples but fails to capture the variability in the true human distribution (e.g., true OOD subset). Right Table: Quantitative comparison showing intra- and inter-distance for clean and attacked data, indicating that the distance of human text is larger than the distance among LLM-generated text.
  • Figure 2: An overview of our proposed OOD detection pipeline under DeepSVDD method. This pipeline shows the process of training a text encoder with a DeepSVDD loss. (Right) Illustration of the learned embedding space: LLM-generated texts are enclosed within a hypersphere, while human-written texts fall outside.
  • Figure 3: Hyper-parameter sensitivity analysis of our method. The experiments are conducted on DeepFake dataset and show that our method is robust to the choice of weight.

Theorems & Definitions (5)

  • Theorem 2
  • Definition 3
  • Definition 4
  • Remark 5
  • Theorem 6