Table of Contents
Fetching ...

On the detection of synthetic images generated by diffusion models

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, Luisa Verdoliva

TL;DR

This work addresses the challenge of detecting synthetic images generated by diffusion models (DMs) and investigates whether DM outputs carry forensic fingerprints similar to GANs. It analyzes artifact traces via noise-residual and Fourier-spectral analyses and benchmarks several state-of-the-art detectors, trained on GANs, under both ideal and social-media-like conditions. Results show partial fingerprint presence in some DMs, but detector generalization to unseen DM architectures is limited, with performance highly dependent on the training data and processing, even after simple fusion and calibration. The findings underscore the need for diffusion-model–aware forensic tools and more robust detection strategies in realistic pipelines.

Abstract

Over the past decade, there has been tremendous progress in creating synthetic media, mainly thanks to the development of powerful methods based on generative adversarial networks (GAN). Very recently, methods based on diffusion models (DM) have been gaining the spotlight. In addition to providing an impressive level of photorealism, they enable the creation of text-based visual content, opening up new and exciting opportunities in many different application fields, from arts to video games. On the other hand, this property is an additional asset in the hands of malicious users, who can generate and distribute fake media perfectly adapted to their attacks, posing new challenges to the media forensic community. With this work, we seek to understand how difficult it is to distinguish synthetic images generated by diffusion models from pristine ones and whether current state-of-the-art detectors are suitable for the task. To this end, first we expose the forensics traces left by diffusion models, then study how current detectors, developed for GAN-generated images, perform on these new synthetic images, especially in challenging social-networks scenarios involving image compression and resizing. Datasets and code are available at github.com/grip-unina/DMimageDetection.

On the detection of synthetic images generated by diffusion models

TL;DR

This work addresses the challenge of detecting synthetic images generated by diffusion models (DMs) and investigates whether DM outputs carry forensic fingerprints similar to GANs. It analyzes artifact traces via noise-residual and Fourier-spectral analyses and benchmarks several state-of-the-art detectors, trained on GANs, under both ideal and social-media-like conditions. Results show partial fingerprint presence in some DMs, but detector generalization to unseen DM architectures is limited, with performance highly dependent on the training data and processing, even after simple fusion and calibration. The findings underscore the need for diffusion-model–aware forensic tools and more robust detection strategies in realistic pipelines.

Abstract

Over the past decade, there has been tremendous progress in creating synthetic media, mainly thanks to the development of powerful methods based on generative adversarial networks (GAN). Very recently, methods based on diffusion models (DM) have been gaining the spotlight. In addition to providing an impressive level of photorealism, they enable the creation of text-based visual content, opening up new and exciting opportunities in many different application fields, from arts to video games. On the other hand, this property is an additional asset in the hands of malicious users, who can generate and distribute fake media perfectly adapted to their attacks, posing new challenges to the media forensic community. With this work, we seek to understand how difficult it is to distinguish synthetic images generated by diffusion models from pristine ones and whether current state-of-the-art detectors are suitable for the task. To this end, first we expose the forensics traces left by diffusion models, then study how current detectors, developed for GAN-generated images, perform on these new synthetic images, especially in challenging social-networks scenarios involving image compression and resizing. Datasets and code are available at github.com/grip-unina/DMimageDetection.
Paper Structure (6 sections, 2 figures, 2 tables)

This paper contains 6 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Synthetic images generated using recent text-to-image models: DALL·E 2 ramesh2022hierarchical, stable diffusion stablediffusion2022 and GLIDE nichol2021glide.
  • Figure 2: Fourier transform (amplitude) of the artificial fingerprint estimated from 1000 image residuals. Top row: from left to right ProGAN karras2018progressive, BigGan brock2018large, StyleGAN2 karras2020analyzing, Taming Transformers esser2021taming, DALL·E Mini Dayma_DALLE_Mini_2021. Bottom row: GLIDE nichol2021glide, Latent Diffusion rombach2022high, Stable Diffusion stablediffusion2022, ADM dhariwal2021diffusion, DALL·E 2 ramesh2022hierarchical