Towards the Detection of AI-Synthesized Human Face Images
Yuhang Lu, Touradj Ebrahimi
TL;DR
This work tackles the problem of detecting entirely AI-synthesized human face images from both GANs and diffusion models. It introduces a large benchmark across seven generative models, evaluates generalization and robustness of existing detectors, and analyzes frequency-domain forgery traces. The key finding is that detectors trained on general fake images struggle to generalize to synthetic faces, while detectors trained on frequency representations offer markedly better cross-model performance. The study also demonstrates that frequency-based detectors, especially when paired with architectures like EfficientNetB4, can surpass prior methods on several model families, though robustness to strong perturbations remains a challenge. Overall, the paper provides valuable datasets, insights into spectral artifacts, and a practical frequency-domain detection approach with implications for real-world fake-face surveillance and attribution.
Abstract
Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.
