Table of Contents
Fetching ...

Towards the Detection of AI-Synthesized Human Face Images

Yuhang Lu, Touradj Ebrahimi

TL;DR

This work tackles the problem of detecting entirely AI-synthesized human face images from both GANs and diffusion models. It introduces a large benchmark across seven generative models, evaluates generalization and robustness of existing detectors, and analyzes frequency-domain forgery traces. The key finding is that detectors trained on general fake images struggle to generalize to synthetic faces, while detectors trained on frequency representations offer markedly better cross-model performance. The study also demonstrates that frequency-based detectors, especially when paired with architectures like EfficientNetB4, can surpass prior methods on several model families, though robustness to strong perturbations remains a challenge. Overall, the paper provides valuable datasets, insights into spectral artifacts, and a practical frequency-domain detection approach with implications for real-world fake-face surveillance and attribution.

Abstract

Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.

Towards the Detection of AI-Synthesized Human Face Images

TL;DR

This work tackles the problem of detecting entirely AI-synthesized human face images from both GANs and diffusion models. It introduces a large benchmark across seven generative models, evaluates generalization and robustness of existing detectors, and analyzes frequency-domain forgery traces. The key finding is that detectors trained on general fake images struggle to generalize to synthetic faces, while detectors trained on frequency representations offer markedly better cross-model performance. The study also demonstrates that frequency-based detectors, especially when paired with architectures like EfficientNetB4, can surpass prior methods on several model families, though robustness to strong perturbations remains a challenge. Overall, the paper provides valuable datasets, insights into spectral artifacts, and a practical frequency-domain detection approach with implications for real-world fake-face surveillance and attribution.

Abstract

Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.
Paper Structure (15 sections, 4 figures, 3 tables)

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Realistic synthetic human face images generated by ProGAN karras2017progressive, StyleGAN2 Karras2019stylegan2, DDPM ho2020denoising, DDIM song2020denoising, PNDM liu2022pseudo, and LDM rombach2022high respectively.
  • Figure 2: Mean frequency spectra of real images from CelebA-HQ karras2017progressive and synthetic human face images created by three GAN models, namely ProGAN karras2017progressive, StyleGAN2 Karras2019stylegan2, and VQGAN esser2021taming.
  • Figure 3: Mean frequency spectra of real images from CelebA-HQ karras2017progressive and synthetic human face images created by four diffusion models, namely DDPM ho2020denoising, DDIM song2020denoising, PNDM liu2022pseudo, and LDM rombach2022high.
  • Figure 4: Performance of various detectors under the perturbation of JPEG compression, Gaussian blur, Gaussian noise, and resizing operation (from left to right). The evaluation is conducted on two test sets created by ProGAN and DDIM respectively.