Table of Contents
Fetching ...

Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

Caiyun Xie, Dengpan Ye, Yunming Zhang, Long Tang, Yunna Lv, Jiacheng Deng, Jiawei Song

TL;DR

The paper tackles the vulnerability of GAN- and diffusion-based AIGC detectors to adversarial attacks in real-world settings. It introduces R$^2$BA, a realistic-like robust black-box attack that fuses Gaussian blur, JPEG compression, Gaussian noise, and light spots, optimized via stochastic PSO with inertia decay to cross the detector boundary at a fake probability of $0.5$ while preserving image quality. The authors demonstrate substantial improvements in anti-detection performance (up to 38–41% ASR gains) and image invisibility (BRISQUE/SSIM) across multiple detectors and datasets, including a commercial API. The work highlights the practical security risks in AIGC detection and provides a benchmark for evaluating detector robustness under realistic post-processing conditions.

Abstract

The security of AI-generated content (AIGC) detection is crucial for ensuring multimedia content credibility. To enhance detector security, research on adversarial attacks has become essential. However, most existing adversarial attacks focus only on GAN-generated facial images detection, struggle to be effective on multi-class natural images and diffusion-based detectors, and exhibit poor invisibility. To fill this gap, we first conduct an in-depth analysis of the vulnerability of AIGC detectors and discover the feature that detectors vary in vulnerability to different post-processing. Then, considering that the detector is agnostic in real-world scenarios and given this discovery, we propose a Realistic-like Robust Black-box Adversarial attack (R$^2$BA) with post-processing fusion optimization. Unlike typical perturbations, R$^2$BA uses real-world post-processing, i.e., Gaussian blur, JPEG compression, Gaussian noise and light spot to generate adversarial examples. Specifically, we use a stochastic particle swarm algorithm with inertia decay to optimize post-processing fusion intensity and explore the detector's decision boundary. Guided by the detector's fake probability, R$^2$BA enhances/weakens the detector-vulnerable/detector-robust post-processing intensity to strike a balance between adversariality and invisibility. Extensive experiments on popular/commercial AIGC detectors and datasets demonstrate that R$^2$BA exhibits impressive anti-detection performance, excellent invisibility, and strong robustness in GAN-based and diffusion-based cases. Compared to state-of-the-art white-box and black-box attacks, R$^2$BA shows significant improvements of 15\%--72\% and 21\%--47\% in anti-detection performance under the original and robust scenario respectively, offering valuable insights for the security of AIGC detection in real-world applications.

Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

TL;DR

The paper tackles the vulnerability of GAN- and diffusion-based AIGC detectors to adversarial attacks in real-world settings. It introduces RBA, a realistic-like robust black-box attack that fuses Gaussian blur, JPEG compression, Gaussian noise, and light spots, optimized via stochastic PSO with inertia decay to cross the detector boundary at a fake probability of while preserving image quality. The authors demonstrate substantial improvements in anti-detection performance (up to 38–41% ASR gains) and image invisibility (BRISQUE/SSIM) across multiple detectors and datasets, including a commercial API. The work highlights the practical security risks in AIGC detection and provides a benchmark for evaluating detector robustness under realistic post-processing conditions.

Abstract

The security of AI-generated content (AIGC) detection is crucial for ensuring multimedia content credibility. To enhance detector security, research on adversarial attacks has become essential. However, most existing adversarial attacks focus only on GAN-generated facial images detection, struggle to be effective on multi-class natural images and diffusion-based detectors, and exhibit poor invisibility. To fill this gap, we first conduct an in-depth analysis of the vulnerability of AIGC detectors and discover the feature that detectors vary in vulnerability to different post-processing. Then, considering that the detector is agnostic in real-world scenarios and given this discovery, we propose a Realistic-like Robust Black-box Adversarial attack (RBA) with post-processing fusion optimization. Unlike typical perturbations, RBA uses real-world post-processing, i.e., Gaussian blur, JPEG compression, Gaussian noise and light spot to generate adversarial examples. Specifically, we use a stochastic particle swarm algorithm with inertia decay to optimize post-processing fusion intensity and explore the detector's decision boundary. Guided by the detector's fake probability, RBA enhances/weakens the detector-vulnerable/detector-robust post-processing intensity to strike a balance between adversariality and invisibility. Extensive experiments on popular/commercial AIGC detectors and datasets demonstrate that RBA exhibits impressive anti-detection performance, excellent invisibility, and strong robustness in GAN-based and diffusion-based cases. Compared to state-of-the-art white-box and black-box attacks, RBA shows significant improvements of 15\%--72\% and 21\%--47\% in anti-detection performance under the original and robust scenario respectively, offering valuable insights for the security of AIGC detection in real-world applications.

Paper Structure

This paper contains 18 sections, 19 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Principle of our method. We use Gaussian blur, JPEG compression, Gaussian noise and light spot to process AI generated images. The post-processed image serves as an adversarial example that leads the detector to recognize the AI-generated image as a real/non-AI-generated image.
  • Figure 2: Average performance degradation. The values represent the attack success rate of different post-processing against AIGC detectors. Darker colors indicate greater vulnerability of detectors. Attacks used include Gaussian blur (GB), JPEG compression (JG), Gaussian noise (GN), light spot (LS), and sequential fusion attacks (ALL means GB-JG-GN-LS).
  • Figure 3: The pipeline of R$^2$BA. It contains four modules and a main optimization process (Initialization-Optimization iteration-Selection). R$^2$BA uses $N$ randomly generated post-processing parameters as particle positions, optimizing their velocities and positions to balance the invisibility and adversariality. A successful attack occurs when a particle crosses the detector's decision boundary, and R$^2$BA outputs the adversarial example with the highest SSIM value among the successful ones. The definitions of the notations here are the same as those in Table \ref{['tab:notations']}.
  • Figure 4: Comparative experiments on image quality of adversarial examples. The dataset we use is GenImageGenImage and the detector is FatFormerFatFormer. We zoom in on the local details at the end and visualize the perturbations across the entire image.
  • Figure 5: The effect of the number of particles on the attack success rate, BRISQUE, SSIM, and the number of queries.
  • ...and 1 more figures