Table of Contents
Fetching ...

Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection

Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, Luisa Verdoliva

TL;DR

This paper studies the adversarial robustness of AI-generated image detectors, focusing on Contrastive LanguageImage Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods.

Abstract

In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.

Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection

TL;DR

This paper studies the adversarial robustness of AI-generated image detectors, focusing on Contrastive LanguageImage Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods.

Abstract

In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.
Paper Structure (9 sections, 1 equation, 6 figures, 1 table)

This paper contains 9 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: CLIP vs. ResNet.$l_2$-PGD attacks to ResNet-based (top) and CLIP-based (bottom) detectors. From left to right, attacked image, magnified adversarial perturbation, average spectrum of the adversarial noise. Attacks to Transformer-based detectors work on lower frequencies than attacks to CNN-based detectors do. In addition they present a clear cross-shaped directional spectrum, as also shown in bhojanapalli2021understanding for image classification, which is due to the patch-wise processing before the transformer blocks.
  • Figure 2: From left to right: attacked image; magnified noise patterns generated by RWA and UA for a CNN-based detector; magnified noise patterns generated by RWA and UA for a CLIP-based detector. Even for these attacks we can make similar observations as done for $l_2$-PGD in Fig. 1: CLIP-based attacks are more structured and show clear regular patterns.
  • Figure 3: Successful Attack Rate of four attacks (PGD, DI$^2$-FGSM, RWA, UA) at two strength levels ($\epsilon$=8, $\epsilon$=16) on the eight detectors of Tab. \ref{['tab:detectors']}. Cells on the diagonal correspond to white-box attacks. Off-diagonal cells correspond to transferred attacks.
  • Figure 4: Transferability of $l_2$-PGD attacks as a function of the attack strength $\epsilon$. Solid lines represent the average SAR and colored bands its standard deviation. We consider target and source detectors belonging to the same family (top-left, bottom-right) or different families (top-right, bottom-left).
  • Figure 5: Power spectra of adversarial noise patterns generated by a specific attack (rows) on a selected detector (columns). From top to bottom: $l_2$-PGD, DI$^2$-FGSM, RWA, UA attacks. From left to right: detectors (1) to (8) listed in Table \ref{['tab:detectors']}.
  • ...and 1 more figures