Table of Contents
Fetching ...

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

Jie Li, Yingying Feng, Chi Xie, Jie Hu, Lei Tan, Jiayi Ji

Abstract

The remarkable realism of images generated by diffusion models poses critical detection challenges. Current methods utilize reconstruction error as a discriminative feature, exploiting the observation that real images exhibit higher reconstruction errors when processed through diffusion models. However, these approaches require costly reconstruction computations and depend on specific diffusion models, making their performance highly model-dependent. We identify a fundamental difference: real images are more difficult to fit with Gaussian distributions compared to synthetic ones. In this paper, we propose Forgery Identification via Noise Disturbance (FIND), a novel method that requires only a simple binary classifier. It eliminates reconstruction by directly targeting the core distributional difference between real and synthetic images. Our key operation is to add Gaussian noise to real images during training and label these noisy versions as synthetic. This step allows the classifier to focus on the statistical patterns that distinguish real from synthetic images. We theoretically prove that the noise-augmented real images resemble diffusion-generated images in their ease of Gaussian fitting. Furthermore, simply by adding noise, they still retain visual similarity to the original images, highlighting the most discriminative distribution-related features. The proposed FIND improves performance by 11.7% on the GenImage benchmark while running 126x faster than existing methods. By removing the need for auxiliary diffusion models and reconstruction, it offers a practical, efficient, and generalizable way to detect diffusion-generated content.

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

Abstract

The remarkable realism of images generated by diffusion models poses critical detection challenges. Current methods utilize reconstruction error as a discriminative feature, exploiting the observation that real images exhibit higher reconstruction errors when processed through diffusion models. However, these approaches require costly reconstruction computations and depend on specific diffusion models, making their performance highly model-dependent. We identify a fundamental difference: real images are more difficult to fit with Gaussian distributions compared to synthetic ones. In this paper, we propose Forgery Identification via Noise Disturbance (FIND), a novel method that requires only a simple binary classifier. It eliminates reconstruction by directly targeting the core distributional difference between real and synthetic images. Our key operation is to add Gaussian noise to real images during training and label these noisy versions as synthetic. This step allows the classifier to focus on the statistical patterns that distinguish real from synthetic images. We theoretically prove that the noise-augmented real images resemble diffusion-generated images in their ease of Gaussian fitting. Furthermore, simply by adding noise, they still retain visual similarity to the original images, highlighting the most discriminative distribution-related features. The proposed FIND improves performance by 11.7% on the GenImage benchmark while running 126x faster than existing methods. By removing the need for auxiliary diffusion models and reconstruction, it offers a practical, efficient, and generalizable way to detect diffusion-generated content.
Paper Structure (14 sections, 9 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 9 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between FIND and previous noise-based methods. (a) DIRE utilizes complete reconstruction by adding noise and then denoising it with a Diffusion Model (DM) in multiple steps; (b) LaRE$^2$ compresses the noise addition and denoising steps into a single step within the latent space; (c) Our method only involves adding noise without the denoising process. Compared with previous methods, FIND avoids explicit reconstruction and omits the dependence on DM.
  • Figure 2: The training framework of FIND. Gaussian noise is added to real images, and these perturbed versions are labeled as synthetic within the training process. This enables FIND to mitigate the reconstruction model dependency, and learn the core distributional differences between real and synthetic images, resulting in a superior generalization ability.
  • Figure 3: Visual illustration of Gaussian distribution with increased variances. Left panel: A joint distribution (green) from three Gaussians, with its two-component Gaussian mixture model fit (red). Right panel: Increasing the variances smooths the joint distribution, causing the two left-hand Gaussians to nearly merge. The drop of fitting MSE indicates that distributions with increased variance are easier to fit.
  • Figure 4: Cross-validation results across different training and testing subsets. We report the accuracy for DIRE, LaRE$^{2}$ and Ours on all 8 generators, where each row represents a corresponding training subset. Our method shows consistent and superior performance on different training sets and test sets.
  • Figure 5: The impact of $\epsilon$. The accuracy surges as $\epsilon$ rises initially and starts fluctuating when $\epsilon$ exceeds 20. The quality of the images deteriorates rapidly as $\epsilon$ increases. We pick a balanced value of $\epsilon=50$.