Table of Contents
Fetching ...

Diffusion Facial Forgery Detection

Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

TL;DR

This work addresses the rising threat of diffusion-generated facial forgeries by introducing DiFF, a large-scale dataset with over 5×10^5 forged images generated by 13 diffusion methods under 4 conditions, guided by about 30k textual prompts and 10k visual prompts across 1,070 identities. The authors establish benchmarks via human studies and detectors, revealing that existing detectors struggle to generalize to diffusion-based facial forgeries, with human performance often near chance. To improve robustness, they propose Edge Graph Regularization (EGR), which augments models with edge-graph cues through a regularized objective $R_{\mathcal{S}}(\theta) = \hat{R}_{\mathcal{S}}(\theta) + \lambda \frac{1}{n} \sum_{i=1}^n \ell(\theta, {\mathbf{E}}_i, y_i)$, yielding about 10% average AUC improvements across detectors and substantial gains in cross-domain and FE scenarios. DiFF provides a comprehensive benchmark and a practical technique to enhance diffusion-forgery detection, with implications for detection systems, policy, and responsible deployment of generative technologies.

Abstract

Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting diffusion-generated facial forgeries. Furthermore, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.

Diffusion Facial Forgery Detection

TL;DR

This work addresses the rising threat of diffusion-generated facial forgeries by introducing DiFF, a large-scale dataset with over 5×10^5 forged images generated by 13 diffusion methods under 4 conditions, guided by about 30k textual prompts and 10k visual prompts across 1,070 identities. The authors establish benchmarks via human studies and detectors, revealing that existing detectors struggle to generalize to diffusion-based facial forgeries, with human performance often near chance. To improve robustness, they propose Edge Graph Regularization (EGR), which augments models with edge-graph cues through a regularized objective , yielding about 10% average AUC improvements across detectors and substantial gains in cross-domain and FE scenarios. DiFF provides a comprehensive benchmark and a practical technique to enhance diffusion-forgery detection, with implications for detection systems, policy, and responsible deployment of generative technologies.

Abstract

Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting diffusion-generated facial forgeries. Furthermore, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.
Paper Structure (19 sections, 3 equations, 8 figures, 10 tables)

This paper contains 19 sections, 3 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: DiFF -- a diffusion-generated facial forgery dataset encompassing over half a million images. The dataset contains manipulated images created by thirteen state-of-the-art methods under four distinct conditions. The dataset will be released at https://github.com/xaCheng1996/DiFF.
  • Figure 2: Gender and age group distribution of pristine and forgery subsets. Within each subset, percentages for different ages (ranging from 20 to 60) are calculated separately for males (blue bars) and females (red bars).
  • Figure 3: Pipeline of prompts construction and modification.
  • Figure 4: Word cloud of the top 200 most frequent and content words in $\mathcal{P}^t_{ori}$. Each word is sized by its frequency.
  • Figure 5: Facial forgery generation under four conditions.
  • ...and 3 more figures