Table of Contents
Fetching ...

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

Haipeng Li, Rongxuan Peng, Anwei Luo, Shunquan Tan, Changsheng Chen, Anastasia Antsiferova

TL;DR

This work identifies a systemic vulnerability in AIGC forensics arising from widespread use of shared upstream backbones such as CLIP, which enables universal anti-forensics attacks. It introduces ForgeryEraser, a universal framework that uses a multi-modal guidance loss $L_{MMG}$ to steer forged image embeddings toward text-derived authentic anchors while repelling forgery anchors, operating with a source-aware strategy for global synthesis versus local editing. The method achieves substantial degradation across six detectors on both global synthesis and local editing benchmarks, with robust performance under common distortions and even the ability to influence detectors' explanations to align with authenticity. The results highlight the need to rethink reliance on upstream semantic representations and to develop defenses resilient to semantic-level manipulation and interpretability threats in digital media forensics.

Abstract

The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

TL;DR

This work identifies a systemic vulnerability in AIGC forensics arising from widespread use of shared upstream backbones such as CLIP, which enables universal anti-forensics attacks. It introduces ForgeryEraser, a universal framework that uses a multi-modal guidance loss to steer forged image embeddings toward text-derived authentic anchors while repelling forgery anchors, operating with a source-aware strategy for global synthesis versus local editing. The method achieves substantial degradation across six detectors on both global synthesis and local editing benchmarks, with robust performance under common distortions and even the ability to influence detectors' explanations to align with authenticity. The results highlight the need to rethink reliance on upstream semantic representations and to develop defenses resilient to semantic-level manipulation and interpretability threats in digital media forensics.

Abstract

The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
Paper Structure (19 sections, 3 equations, 11 figures, 10 tables)

This paper contains 19 sections, 3 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Universal Anti-forensics Attack with ForgeryEraser.Top: A standard forensic model correctly identifies synthetic artifacts. Bottom: By guiding embeddings within the shared backbone toward authentic anchors, our method causes downstream detectors to invert their verdicts and fabricate plausible justifications.
  • Figure 2: Overview of the ForgeryEraser framework. The optimization pipeline incorporates Differentiable Resampling to bridge the resolution gap while suppressing aliasing artifacts. Based on a source-aware strategy, the multi-modal guidance loss pulls the image embeddings toward the selected authentic anchors (Green) while pushing them away from forgery anchors (Orange), effectively erasing manipulation traces within the shared feature space.
  • Figure 3: Feature Space Visualization (t-SNE). Projections of CLIP embeddings for (a) Dog and (b) Cat samples from the ProGAN subset, visualizing Real and Fake images before and after the attack.
  • Figure 4: Manipulating Interpretability on SIDA (Left) and FakeVLM (Right).Top Row: Detectors correctly localize and describe visual artifacts on clean images. Bottom Row: Under the ForgeryEraser attack, models are induced to fabricate justifications for authenticity. Note that matching text colors across rows highlight opposing descriptions generated for identical visual features before and after the attack.
  • Figure 5: Visualization of Semantic Anchors. Comparison of text guidance strategies with varying granularities: Untargeted (no text guidance), Coarse-grained (generic class labels), and ForgeryEraser (fine-grained attribute descriptions defined by the source-aware strategy).
  • ...and 6 more figures