Table of Contents
Fetching ...

Decoupling Content and Expression: Two-Dimensional Detection of AI-Generated Text

Guangsheng Bao, Lihua Rong, Yanbin Zhao, Qiji Zhou, Yue Zhang

TL;DR

This paper tackles the challenge of detecting AI participation in text across multiple risk levels by introducing HART, a hierarchical AI risk framework, and a novel 2D Detection Method that decouples content from language expression. By treating AI content and AI expression as separate signals, the authors show that content-based features are more robust to surface-level changes and adversarial edits, yielding substantial improvements on level-2 and level-1 detections and strong cross-language performance. The work contributes a comprehensive benchmark (HART) with diverse domains and languages, ablation analyses, and practical insights into feature choice, model impact, and data distribution effects, culminating in a significant push toward robust, unified AI-text detection. The combination of content and expression signals within the 2D framework achieves state-of-the-art results on RAID and demonstrates resilience against common detection attacks, with broad implications for policy, safety, and content moderation across multilingual contexts.

Abstract

The wide usage of LLMs raises critical requirements on detecting AI participation in texts. Existing studies investigate these detections in scattered contexts, leaving a systematic and unified approach unexplored. In this paper, we present HART, a hierarchical framework of AI risk levels, each corresponding to a detection task. To address these tasks, we propose a novel 2D Detection Method, decoupling a text into content and language expression. Our findings show that content is resistant to surface-level changes, which can serve as a key feature for detection. Experiments demonstrate that 2D method significantly outperforms existing detectors, achieving an AUROC improvement from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID. We release our data and code at https://github.com/baoguangsheng/truth-mirror.

Decoupling Content and Expression: Two-Dimensional Detection of AI-Generated Text

TL;DR

This paper tackles the challenge of detecting AI participation in text across multiple risk levels by introducing HART, a hierarchical AI risk framework, and a novel 2D Detection Method that decouples content from language expression. By treating AI content and AI expression as separate signals, the authors show that content-based features are more robust to surface-level changes and adversarial edits, yielding substantial improvements on level-2 and level-1 detections and strong cross-language performance. The work contributes a comprehensive benchmark (HART) with diverse domains and languages, ablation analyses, and practical insights into feature choice, model impact, and data distribution effects, culminating in a significant push toward robust, unified AI-text detection. The combination of content and expression signals within the 2D framework achieves state-of-the-art results on RAID and demonstrates resilience against common detection attacks, with broad implications for policy, safety, and content moderation across multilingual contexts.

Abstract

The wide usage of LLMs raises critical requirements on detecting AI participation in texts. Existing studies investigate these detections in scattered contexts, leaving a systematic and unified approach unexplored. In this paper, we present HART, a hierarchical framework of AI risk levels, each corresponding to a detection task. To address these tasks, we propose a novel 2D Detection Method, decoupling a text into content and language expression. Our findings show that content is resistant to surface-level changes, which can serve as a key feature for detection. Experiments demonstrate that 2D method significantly outperforms existing detectors, achieving an AUROC improvement from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID. We release our data and code at https://github.com/baoguangsheng/truth-mirror.

Paper Structure

This paper contains 78 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: AI participation in text creation
  • Figure 2: The detection tasks across three risk levels address the four types of AI participation. We represent these types in a two-dimensional space, leading to a 2D detection approach. In this method, the detector performs a binary classification within the two-dimensional space for each detection task.
  • Figure 3: Comparison on their ability to detect AI-generated texts, where ' xxx.ai' are external humanizing tools.
  • Figure 4: Content and expression features evaluated on AI detection tasks using conditional probability curvature as the feature metric.
  • Figure 5: Ablation on the number of dev samples required by 2D method.
  • ...and 3 more figures