StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model
Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji
TL;DR
StealthDiffusion addresses the challenge of evading forensic detection of AI-generated images by operating in the latent space of Stable Diffusion to produce high-quality adversarial images. It combines Latent Adversarial Optimization (LAO) with a Control-VAE module that reduces spectral differences between generated and genuine images. Experimental results on GenImage across multiple diffusion methods and detectors show that StealthDiffusion achieves high ASR in both white-box and black-box settings and yields images with spectral fingerprints close to genuine images. The work highlights the need for detection models to account for latent-space adversarial manipulation and spectral alignment, informing more robust diffusion detectors and forensic defenses.
Abstract
The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection. This task is crucial for understanding the vulnerabilities of existing detection methods and developing more robust techniques. However, current adversarial attacks often introduce visible noise, have poor transferability, and fail to address spectral differences between AI-generated and genuine images. To address this, we propose StealthDiffusion, a framework based on stable diffusion that modifies AI-generated images into high-quality, imperceptible adversarial examples capable of evading state-of-the-art forensic detectors. StealthDiffusion comprises two main components: Latent Adversarial Optimization, which generates adversarial perturbations in the latent space of stable diffusion, and Control-VAE, a module that reduces spectral differences between the generated adversarial images and genuine images without affecting the original diffusion model's generation process. Extensive experiments show that StealthDiffusion is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries with frequency spectra similar to genuine images. These forgeries are classified as genuine by advanced forensic classifiers and are difficult for humans to distinguish.
