Table of Contents
Fetching ...

Towards the Detection of Diffusion Model Deepfakes

Jonas Ricker, Simon Damm, Thorsten Holz, Asja Fischer

TL;DR

This study rigorously evaluates the detectability of diffusion-model deepfakes (DMs) and contrasts it with GAN-based deepfakes. It finds that existing GAN detectors struggle on DM-generated images, but retraining detectors on DM data yields near-perfect detection and can generalize to GANs. Feature-space analyses reveal DM-generated images harbor fewer recognizable artifacts, explaining their weaker detectability and cross-domain transfer properties. Frequency-domain investigations show DMs lack the grid-like artifacts typical of GANs and tend to underrepresent high frequencies due to their training objectives, pointing to high-frequency content as a promising avenue for future detection methods. Overall, the work establishes a foundation for DM-specific deepfake detection and highlights directions for leveraging frequency characteristics and detector retraining to improve robustness.

Abstract

In the course of the past few years, diffusion models (DMs) have reached an unprecedented level of visual quality. However, relatively little attention has been paid to the detection of DM-generated images, which is critical to prevent adverse impacts on our society. In contrast, generative adversarial networks (GANs), have been extensively studied from a forensic perspective. In this work, we therefore take the natural next step to evaluate whether previous methods can be used to detect images generated by DMs. Our experiments yield two key findings: (1) state-of-the-art GAN detectors are unable to reliably distinguish real from DM-generated images, but (2) re-training them on DM-generated images allows for almost perfect detection, which remarkably even generalizes to GANs. Together with a feature space analysis, our results lead to the hypothesis that DMs produce fewer detectable artifacts and are thus more difficult to detect compared to GANs. One possible reason for this is the absence of grid-like frequency artifacts in DM-generated images, which are a known weakness of GANs. However, we make the interesting observation that diffusion models tend to underestimate high frequencies, which we attribute to the learning objective.

Towards the Detection of Diffusion Model Deepfakes

TL;DR

This study rigorously evaluates the detectability of diffusion-model deepfakes (DMs) and contrasts it with GAN-based deepfakes. It finds that existing GAN detectors struggle on DM-generated images, but retraining detectors on DM data yields near-perfect detection and can generalize to GANs. Feature-space analyses reveal DM-generated images harbor fewer recognizable artifacts, explaining their weaker detectability and cross-domain transfer properties. Frequency-domain investigations show DMs lack the grid-like artifacts typical of GANs and tend to underrepresent high frequencies due to their training objectives, pointing to high-frequency content as a promising avenue for future detection methods. Overall, the work establishes a foundation for DM-specific deepfake detection and highlights directions for leveraging frequency characteristics and detector retraining to improve robustness.

Abstract

In the course of the past few years, diffusion models (DMs) have reached an unprecedented level of visual quality. However, relatively little attention has been paid to the detection of DM-generated images, which is critical to prevent adverse impacts on our society. In contrast, generative adversarial networks (GANs), have been extensively studied from a forensic perspective. In this work, we therefore take the natural next step to evaluate whether previous methods can be used to detect images generated by DMs. Our experiments yield two key findings: (1) state-of-the-art GAN detectors are unable to reliably distinguish real from DM-generated images, but (2) re-training them on DM-generated images allows for almost perfect detection, which remarkably even generalizes to GANs. Together with a feature space analysis, our results lead to the hypothesis that DMs produce fewer detectable artifacts and are thus more difficult to detect compared to GANs. One possible reason for this is the absence of grid-like frequency artifacts in DM-generated images, which are a known weakness of GANs. However, we make the interesting observation that diffusion models tend to underestimate high frequencies, which we attribute to the learning objective.
Paper Structure (51 sections, 8 equations, 25 figures, 5 tables)

This paper contains 51 sections, 8 equations, 25 figures, 5 tables.

Figures (25)

  • Figure 1: Detection performance for re-trained detectors. The columns GAN, DM, and All correspond to models trained on samples from all GANs, all DMs, and both, respectively.
  • Figure 2: Feature space visualization for the detector Wang2020 via t-SNE of real and generated images in two dimensions. The features correspond to the representation prior to the last fully-connected layer of the given detector.
  • Figure 3: Mean DFT spectrum of real and generated images. To increase visibility, the color bar is limited to $[10^{-5}, 10^{-1}]$, with values lying outside this interval being clipped.
  • Figure 4: Mean reduced spectrum of real and generated images. The part of the spectrum where GAN-characteristic discrepancies occur is magnified.
  • Figure 5: Spectral density error $\tilde{S}_\text{err}$ throughout the denoising process. The error is computed relative to the spectrum of real images. We display the error for (a) all sampling steps and (b) a close-up of the last 100 steps. The colorbar is clipped at -1 and 1.
  • ...and 20 more figures