Table of Contents
Fetching ...

Raidar: geneRative AI Detection viA Rewriting

Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang

TL;DR

Raidar tackles AI-generated text detection by exploiting rewriting behavior: prompts prompt LLMs to rewrite input text and the method derives invariance, equivariance, and uncertainty signals from symbol-level edits. These signals feed a binary detector, enabling robust detection across domains and under adaptive evasion, without requiring access to LLM probability scores. Across six paragraph-level datasets and multiple generation models, Raidar yields substantial F1 gains over prior detectors (up to 29 points in-distribution and up to 32 points OOD) and remains effective when rewrites come from different models or are aimed at evasion. The approach is simple, model-agnostic for the generating side (black-box LLMs), robust to fine-tuning and non-native writing, and offers a practical path for auditing AI-generated content in education, publishing, and online platforms.

Abstract

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.

Raidar: geneRative AI Detection viA Rewriting

TL;DR

Raidar tackles AI-generated text detection by exploiting rewriting behavior: prompts prompt LLMs to rewrite input text and the method derives invariance, equivariance, and uncertainty signals from symbol-level edits. These signals feed a binary detector, enabling robust detection across domains and under adaptive evasion, without requiring access to LLM probability scores. Across six paragraph-level datasets and multiple generation models, Raidar yields substantial F1 gains over prior detectors (up to 29 points in-distribution and up to 32 points OOD) and remains effective when rewrites come from different models or are aimed at evasion. The approach is simple, model-agnostic for the generating side (black-box LLMs), robust to fine-tuning and non-native writing, and offers a practical path for auditing AI-generated content in education, publishing, and online platforms.

Abstract

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
Paper Structure (17 sections, 1 equation, 9 figures, 15 tables, 3 algorithms)

This paper contains 17 sections, 1 equation, 9 figures, 15 tables, 3 algorithms.

Figures (9)

  • Figure 1: We introduce "Detecting via Rewriting," an approach that detects machine-generated text by calculating rewriting modifications. We show the character deletion in red and the character insertion in orange. Human-generated text tends to trigger more modifications than machine-generated text when asked to be rewritten. Our method is simple and effective, requiring the least access to LLM while being robust to novel text input.
  • Figure 2: The rewriting similarity score of human and GPT-generated text. The similarity score measures how similar the text is before and after the rewriting. A larger similarity score indicates that rewriting makes less change. (a) We show the similarity score under a single transformation; machine-generated text (red) is invariant after rewriting compared with human-generated text. (b) We show the similarity score under a transformation and its reverse transformation; the machine-generated text is more equivariant under transformation. (c) We show the uncertainty of text produced by humans and GPT. GPT input is more stable than human input. The samples are run on the Yelp Review dataset with 4000 samples. The discrepancies in invariance, equivariance, and output uncertainty allow us to detect machine-generated text.
  • Figure 3: Examples of text rewriting on six datasets for invariance. We use a green background to indicate human-written text, and a red background to indicate machine-generated text. We show the character deletion in red and the character insertion in orange. Human-written text tends to be modified more than machine-generated text. Our detection algorithm relies on this difference to make predictions.
  • Figure 4: Examples for equivariance. We show an example on the Yelp Review dataset. For simplicity, we use identity transformation $p$, and use the "opposite meaning" as the equivariance transformation $T$. GPT data tends to be consistent to the original input after transformation and reversal.
  • Figure 5: Detection performance as input length increases. On the Yelp dataset, we show that longer input often enables better detection performance. The number shows the number of data, reflecting by the size of the dot.
  • ...and 4 more figures