Towards the Detection of Diffusion Model Deepfakes
Jonas Ricker, Simon Damm, Thorsten Holz, Asja Fischer
TL;DR
This study rigorously evaluates the detectability of diffusion-model deepfakes (DMs) and contrasts it with GAN-based deepfakes. It finds that existing GAN detectors struggle on DM-generated images, but retraining detectors on DM data yields near-perfect detection and can generalize to GANs. Feature-space analyses reveal DM-generated images harbor fewer recognizable artifacts, explaining their weaker detectability and cross-domain transfer properties. Frequency-domain investigations show DMs lack the grid-like artifacts typical of GANs and tend to underrepresent high frequencies due to their training objectives, pointing to high-frequency content as a promising avenue for future detection methods. Overall, the work establishes a foundation for DM-specific deepfake detection and highlights directions for leveraging frequency characteristics and detector retraining to improve robustness.
Abstract
In the course of the past few years, diffusion models (DMs) have reached an unprecedented level of visual quality. However, relatively little attention has been paid to the detection of DM-generated images, which is critical to prevent adverse impacts on our society. In contrast, generative adversarial networks (GANs), have been extensively studied from a forensic perspective. In this work, we therefore take the natural next step to evaluate whether previous methods can be used to detect images generated by DMs. Our experiments yield two key findings: (1) state-of-the-art GAN detectors are unable to reliably distinguish real from DM-generated images, but (2) re-training them on DM-generated images allows for almost perfect detection, which remarkably even generalizes to GANs. Together with a feature space analysis, our results lead to the hypothesis that DMs produce fewer detectable artifacts and are thus more difficult to detect compared to GANs. One possible reason for this is the absence of grid-like frequency artifacts in DM-generated images, which are a known weakness of GANs. However, we make the interesting observation that diffusion models tend to underestimate high frequencies, which we attribute to the learning objective.
