Table of Contents
Fetching ...

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji

TL;DR

StealthDiffusion addresses the challenge of evading forensic detection of AI-generated images by operating in the latent space of Stable Diffusion to produce high-quality adversarial images. It combines Latent Adversarial Optimization (LAO) with a Control-VAE module that reduces spectral differences between generated and genuine images. Experimental results on GenImage across multiple diffusion methods and detectors show that StealthDiffusion achieves high ASR in both white-box and black-box settings and yields images with spectral fingerprints close to genuine images. The work highlights the need for detection models to account for latent-space adversarial manipulation and spectral alignment, informing more robust diffusion detectors and forensic defenses.

Abstract

The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection. This task is crucial for understanding the vulnerabilities of existing detection methods and developing more robust techniques. However, current adversarial attacks often introduce visible noise, have poor transferability, and fail to address spectral differences between AI-generated and genuine images. To address this, we propose StealthDiffusion, a framework based on stable diffusion that modifies AI-generated images into high-quality, imperceptible adversarial examples capable of evading state-of-the-art forensic detectors. StealthDiffusion comprises two main components: Latent Adversarial Optimization, which generates adversarial perturbations in the latent space of stable diffusion, and Control-VAE, a module that reduces spectral differences between the generated adversarial images and genuine images without affecting the original diffusion model's generation process. Extensive experiments show that StealthDiffusion is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries with frequency spectra similar to genuine images. These forgeries are classified as genuine by advanced forensic classifiers and are difficult for humans to distinguish.

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

TL;DR

StealthDiffusion addresses the challenge of evading forensic detection of AI-generated images by operating in the latent space of Stable Diffusion to produce high-quality adversarial images. It combines Latent Adversarial Optimization (LAO) with a Control-VAE module that reduces spectral differences between generated and genuine images. Experimental results on GenImage across multiple diffusion methods and detectors show that StealthDiffusion achieves high ASR in both white-box and black-box settings and yields images with spectral fingerprints close to genuine images. The work highlights the need for detection models to account for latent-space adversarial manipulation and spectral alignment, informing more robust diffusion detectors and forensic defenses.

Abstract

The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection. This task is crucial for understanding the vulnerabilities of existing detection methods and developing more robust techniques. However, current adversarial attacks often introduce visible noise, have poor transferability, and fail to address spectral differences between AI-generated and genuine images. To address this, we propose StealthDiffusion, a framework based on stable diffusion that modifies AI-generated images into high-quality, imperceptible adversarial examples capable of evading state-of-the-art forensic detectors. StealthDiffusion comprises two main components: Latent Adversarial Optimization, which generates adversarial perturbations in the latent space of stable diffusion, and Control-VAE, a module that reduces spectral differences between the generated adversarial images and genuine images without affecting the original diffusion model's generation process. Extensive experiments show that StealthDiffusion is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries with frequency spectra similar to genuine images. These forgeries are classified as genuine by advanced forensic classifiers and are difficult for humans to distinguish.
Paper Structure (15 sections, 10 equations, 6 figures, 7 tables)

This paper contains 15 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Quantitative and qualitative comparison analysis: (a) Visual examples of spectral images comparing baseline methods and our method. The result of the baseline still contains visible artifacts, whereas the spectral images produced by our proposed method are most similar to the genuine images. (b) Visualization of adversarial examples generated by baseline methods and our method.Our method achieves higher image quality. (c) Quantitative performance comparison of baseline methods and our method on GenImage zhu2023genimage.
  • Figure 2: Overview of our method. We introduce a small adversarial noise to the raw image using the PGD madry2017towards method, then proceed to the Adversarially Optimizing on Latent Space step in Stable Diffusion, and the final output image is obtained by combining the outputs from the Control-VAE. This refined image will be recognized as a genuine image by the forensic detector.
  • Figure 3: Fourier transform (amplitude) of the artificial fingerprint estimated from 1000 image noise residuals. From left to right: genuine images from ImageNet deng2009imagenet. BigGAN brock2018large from Generative Adversarial Network (GAN). ADM dhariwal2021diffusion from Denoising Diffusion Probabilistic Models (DDPM). Stable Diffusion (1.4 and 1.5) rombach2022high from Latent Diffusion Model (LDM).
  • Figure 4: The proposed Control-VAE framework extends the traditional VAE structure by incorporating a residual structure with trainable convolutions to pass the feature maps from the encoder to the decoder. (See Section \ref{['sec:controlvae']} for more details.)
  • Figure 5: Qualitative assessment of adversarial examples generated by FGSM goodfellow2014explaining, PGD madry2017towards, AutoAttack(AA) croce2020reliable, DiffAttack chen2023diffusion, Diff-PGD xue2023diffusion, and our method on the GenImage dataset zhu2023genimage. These samples were generated from different backbones, namely EfficientNet-B0(E) tan2019efficientnet, ResNet-50(R) he2016deep, DeiT(D) touvron2021training and Swin-T(S) liu2021swin.
  • ...and 1 more figures