DAMAGE: Detecting Adversarially Modified AI Generated Text

Elyas Masrour; Bradley Emi; Max Spero

DAMAGE: Detecting Adversarially Modified AI Generated Text

Elyas Masrour, Bradley Emi, Max Spero

TL;DR

This work addresses the vulnerability of AI-generated text detectors to humanizers that rewrite content to evade detection. It introduces DAMAGE, a detector trained with data-centric augmentation to learn invariances to both human and machine paraphrasing, achieving strong performance on standard AI text and RAID-style adversarial attacks. The study analyzes humanizer transformation patterns, demonstrates that even detector-specific adversarial humanization cannot fully erase detectable cues, and shows the detector generalizes across unseen tools. The findings have practical implications for deploying reliable AI-text detectors in education, SEO, and content moderation while suggesting pathways for further strengthening detector robustness.

Abstract

AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.

DAMAGE: Detecting Adversarially Modified AI Generated Text

TL;DR

Abstract

Paper Structure (39 sections, 7 figures, 9 tables)

This paper contains 39 sections, 7 figures, 9 tables.

Introduction
Related Work
AI Detection
Evading AI Detection
Watermarking
Benchmarking
Humanizer Market Survey
Tool Research and Selection
Humanizers are often themselves LLMs
Humanizers are popular on the GPT Store
Humanizers are capable of removing watermarks
Humanized Text Audit
Approach
Insight: Nonsensical Phrases
Insight: Varying Structural Continuity
...and 24 more sections

Figures (7)

Figure 1: Example of an AI humanizer tool
Figure 2: Augmenting the training set with high quality humanizer data improves robustness.
Figure 3: Two out of the four most popular Writing Custom GPTs are Humanizers
Figure 4: This paraphraser performs a very close paraphrase, only replacing individual words and phrases rather than rewriting entire sentences and paragraphs.
Figure 5: We segment humanizers into three tiers, based on their fluency.
...and 2 more figures

DAMAGE: Detecting Adversarially Modified AI Generated Text

TL;DR

Abstract

DAMAGE: Detecting Adversarially Modified AI Generated Text

Authors

TL;DR

Abstract

Table of Contents

Figures (7)