Table of Contents
Fetching ...

Enhancing Frequency Forgery Clues for Diffusion-Generated Image Detection

Daichi Zhang, Tong Zhang, Shiming Ge, Sabine Süsstrunk

TL;DR

The paper addresses the challenge of detecting diffusion-generated images with strong generalization to unseen models and robustness to perturbations. It analyzes frequency-domain differences between real and diffusion-generated images and introduces the $F^2C$ representation, applying a frequency-selective function $w(f)$ to the Fourier spectrum before classification. The function uses a low-frequency cutoff and a kernel-based weighting, with $k(f)=-0.2f^2+0.8f-0.05$ and $w(f)=0$ for $f\le\tau$, $w(f)=(e^{k(f)/2}-1)/f$ for $f>\tau$, enabling discrimination across all bands. Experiments on GenImage, UniformerDiffusion, and DiffusionForensics show state-of-the-art generalization to unseen diffusion models and robustness to perturbations, highlighting the practical utility of spectral-discrepancy-based diffusion image detectors and suggesting extensions to broader AIGC detection tasks.

Abstract

Diffusion models have achieved remarkable success in image synthesis, but the generated high-quality images raise concerns about potential malicious use. Existing detectors often struggle to capture discriminative clues across different models and settings, limiting their generalization to unseen diffusion models and robustness to various perturbations. To address this issue, we observe that diffusion-generated images exhibit progressively larger differences from natural real images across low- to high-frequency bands. Based on this insight, we propose a simple yet effective representation by enhancing the Frequency Forgery Clue (F^2C) across all frequency bands. Specifically, we introduce a frequency-selective function which serves as a weighted filter to the Fourier spectrum, suppressing less discriminative bands while enhancing more informative ones. This approach, grounded in a comprehensive analysis of frequency-based differences between natural real and diffusion-generated images, enables general detection of images from unseen diffusion models and provides robust resilience to various perturbations. Extensive experiments on various diffusion-generated image datasets demonstrate that our method outperforms state-of-the-art detectors with superior generalization and robustness.

Enhancing Frequency Forgery Clues for Diffusion-Generated Image Detection

TL;DR

The paper addresses the challenge of detecting diffusion-generated images with strong generalization to unseen models and robustness to perturbations. It analyzes frequency-domain differences between real and diffusion-generated images and introduces the representation, applying a frequency-selective function to the Fourier spectrum before classification. The function uses a low-frequency cutoff and a kernel-based weighting, with and for , for , enabling discrimination across all bands. Experiments on GenImage, UniformerDiffusion, and DiffusionForensics show state-of-the-art generalization to unseen diffusion models and robustness to perturbations, highlighting the practical utility of spectral-discrepancy-based diffusion image detectors and suggesting extensions to broader AIGC detection tasks.

Abstract

Diffusion models have achieved remarkable success in image synthesis, but the generated high-quality images raise concerns about potential malicious use. Existing detectors often struggle to capture discriminative clues across different models and settings, limiting their generalization to unseen diffusion models and robustness to various perturbations. To address this issue, we observe that diffusion-generated images exhibit progressively larger differences from natural real images across low- to high-frequency bands. Based on this insight, we propose a simple yet effective representation by enhancing the Frequency Forgery Clue (F^2C) across all frequency bands. Specifically, we introduce a frequency-selective function which serves as a weighted filter to the Fourier spectrum, suppressing less discriminative bands while enhancing more informative ones. This approach, grounded in a comprehensive analysis of frequency-based differences between natural real and diffusion-generated images, enables general detection of images from unseen diffusion models and provides robust resilience to various perturbations. Extensive experiments on various diffusion-generated image datasets demonstrate that our method outperforms state-of-the-art detectors with superior generalization and robustness.

Paper Structure

This paper contains 15 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The magnitude difference between real image and different diffusion-generated models. The fake images generated by different diffusion models (top) leave traces in their Fourier spectrum (middle). We explore their differences with natural real images for the detection of the diffusion-generated images (bottom). The Fourier spectrums are averaged on 1000 sampled images, which contain the same classes to avoid variance in content. The darker the color, the smaller the magnitude; the lighter the color, the larger the magnitude.
  • Figure 2: Mean power spectrum of natural real and diffusion-generated images from different diffusion models (a) and different time steps (b). We further explore the spectrum during the denoising process in (c).
  • Figure 3: Overview of our proposed method. We first analyze the discrepancy of mean power spectrum between natural real and diffusion-generated images, as shown at the top (a). Based on the analysis, we design a specific frequency-selective function $w(f)$ that serves as the filter banks on the Fourier spectrum to restrain the less discriminative frequency bands and to enhance the more discriminative ones, thus leading to more discriminative representation, as shown at the bottom (b).
  • Figure 4: Spectrum discrepancy between natural real and diffusion-generated images in (a) and (b). We further design the frequency-selective function based on the discrepancy as shown in (c).
  • Figure 5: Robustness results to unseen perturbations. Average precision (AP) of different methods, when detecting real/fake images from ProGAN under three different types of perturbations with three different severity levels: Gaussian Noise ($\sigma = 0.001,0.005,0.01$), Gaussian Blur ($\sigma = 1,2,3$), and JPEG Compression ($quality = 75,50,25$) (from left to right).
  • ...and 1 more figures