Table of Contents
Fetching ...

Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks

Run Hao, Peng Ying

TL;DR

This work addresses the vulnerability of AIGC detectors to semantic prompt attacks in text-to-image portrait generation. It introduces a grammar-tree based prompt generator combined with a variant of Monte Carlo Tree Search called UCT-RAND to automatically and efficiently explore semantically rich prompts that evade detectors across multiple generation models. The approach demonstrates strong evasion of both open-source and commercial detectors and ranks first in a real-world adversarial AIGC detection competition, while also enabling the creation of diverse adversarial datasets for robustness training. The study highlights detector fragility to semantic manipulation and offers a practical framework for producing challenging evaluation data and guiding more robust defense strategies against AIGC detectors.

Abstract

The rise of text-to-image (T2I) models has enabled the synthesis of photorealistic human portraits, raising serious concerns about identity misuse and the robustness of AIGC detectors. In this work, we propose an automated adversarial prompt generation framework that leverages a grammar tree structure and a variant of the Monte Carlo tree search algorithm to systematically explore the semantic prompt space. Our method generates diverse, controllable prompts that consistently evade both open-source and commercial AIGC detectors. Extensive experiments across multiple T2I models validate its effectiveness, and the approach ranked first in a real-world adversarial AIGC detection competition. Beyond attack scenarios, our method can also be used to construct high-quality adversarial datasets, providing valuable resources for training and evaluating more robust AIGC detection and defense systems.

Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks

TL;DR

This work addresses the vulnerability of AIGC detectors to semantic prompt attacks in text-to-image portrait generation. It introduces a grammar-tree based prompt generator combined with a variant of Monte Carlo Tree Search called UCT-RAND to automatically and efficiently explore semantically rich prompts that evade detectors across multiple generation models. The approach demonstrates strong evasion of both open-source and commercial detectors and ranks first in a real-world adversarial AIGC detection competition, while also enabling the creation of diverse adversarial datasets for robustness training. The study highlights detector fragility to semantic manipulation and offers a practical framework for producing challenging evaluation data and guiding more robust defense strategies against AIGC detectors.

Abstract

The rise of text-to-image (T2I) models has enabled the synthesis of photorealistic human portraits, raising serious concerns about identity misuse and the robustness of AIGC detectors. In this work, we propose an automated adversarial prompt generation framework that leverages a grammar tree structure and a variant of the Monte Carlo tree search algorithm to systematically explore the semantic prompt space. Our method generates diverse, controllable prompts that consistently evade both open-source and commercial AIGC detectors. Extensive experiments across multiple T2I models validate its effectiveness, and the approach ranked first in a real-world adversarial AIGC detection competition. Beyond attack scenarios, our method can also be used to construct high-quality adversarial datasets, providing valuable resources for training and evaluating more robust AIGC detection and defense systems.

Paper Structure

This paper contains 18 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The adversary uses a lighting-based attack to generate images that can evade the detector.
  • Figure 2: Generated by the flux-schnell model using the prompt: <A portrait of a person> + "with the rest of the page filled entirely with clear text."
  • Figure 3: Generated using the wanx2.0-t2i-turbo model with the prompt: 'Jay Chou's live concert, clear facial features, dazzle.' The Zhuque AIGC Detector estimates a 24.3% probability that the image is AI-generated.
  • Figure 4: A illustration for generating prompts designed to bypass the AIGC detector.
  • Figure 5: Illustration of the Grammar Tree.