MAGE: Machine-generated Text Detection in the Wild

Yafu Li; Qintong Li; Leyang Cui; Wei Bi; Zhilin Wang; Longyue Wang; Linyi Yang; Shuming Shi; Yue Zhang

MAGE: Machine-generated Text Detection in the Wild

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

TL;DR

The paper introduces MAGE, a comprehensive testbed for machine-generated text detection in the wild by pairing human-authored texts from seven writing tasks with machine texts produced by 27 LLMs across three prompts, organized into eight increasingly challenging testbeds. It evaluates multiple detectors (naive baselines, PLM-based Longformer, GLTR, FastText, DetectGPT) and reveals strong in-domain performance but substantial degradation in out-of-domain and unseen-model settings, with boundary refinement enabling average recall around 86% for unseen GPT-4 generated text. A key finding is that perplexity serves as a robust, domain- and model-agnostic feature for clustering human vs. machine text, though paraphrasing attacks can erode detector effectiveness. The work provides public resources and demonstrates practical feasibility for deployment in real-world scenarios, while noting limitations related to new LLMs and reliance on benchmark sources.

Abstract

Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.

MAGE: Machine-generated Text Detection in the Wild

TL;DR

Abstract

Paper Structure (49 sections, 15 figures, 12 tables)

This paper contains 49 sections, 15 figures, 12 tables.

Introduction
Related Work
Dataset Construction
Data Sourcing.
Model sets.
Prompts.
Detection Methods
Experimental Setup
Testbed Settings
Testbed 1: Fixed-domain & Model-specific.
Testbed 2: Arbitrary-domains & Model–specific.
Testbed 3: Fixed-domain & Arbitrary-models.
Testbed 4: Arbitrary-domains & Arbitrary-models.
Testbed 5: Unseen Models.
Testbed 6: Unseen Domains.
...and 34 more sections

Figures (15)

Figure 1: Machine-generated text detection in the wild: the detector encounters texts from various human writings or fake texts generated by diverse LLMs.
Figure 2: Out-of-distribution detection performance on machine-generated texts generated by unseen models. OpenAI(c), OpenAI(t) and OpenAI(s) corresponds to texts generated by OpenAI models using continuation, topical and specified prompts, respectively.
Figure 3: Out-of-distribution detection performance (AvgRec) on texts from unseen domains.
Figure 4: Decision boundary adjustment.
Figure 5: Linguistic difference (Jensen-Shannon distance) between human-written texts and machine-generated texts in 4 in-distribution settings (darker colors indicate larger differences).
...and 10 more figures

MAGE: Machine-generated Text Detection in the Wild

TL;DR

Abstract

MAGE: Machine-generated Text Detection in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (15)