Table of Contents
Fetching ...

Masked Conditional Diffusion Model for Enhancing Deepfake Detection

Tiewen Chen, Shanmin Yang, Shu Hu, Zhenghan Fang, Ying Fu, Xi Wu, Xin Wang

TL;DR

The paper addresses the poor cross-dataset generalization of deepfake detectors by augmenting training data with a Masked Conditional Diffusion Model (MCDM) that inpaints masked facial regions to produce diverse, high-quality forgeries. MCDM uses a random mask generator and a diffusion process conditioned on masked real images, optimizing a joint pixel- and feature-level loss to preserve semantic content while promoting robustness. Empirical results on FF++ and cross-dataset tests (CDF, DFD) show superior intra-dataset AUC and meaningful cross-dataset gains, with ablations confirming the benefits of mask-conditioning and feature reconstruction loss. This approach enhances detection robustness and generalization, offering a practical pathway to more reliable deepfake defense systems with diffusion-based augmentation.

Abstract

Recent studies on deepfake detection have achieved promising results when training and testing faces are from the same dataset. However, their results severely degrade when confronted with forged samples that the model has not yet seen during training. In this paper, deepfake data to help detect deepfakes. this paper present we put a new insight into diffusion model-based data augmentation, and propose a Masked Conditional Diffusion Model (MCDM) for enhancing deepfake detection. It generates a variety of forged faces from a masked pristine one, encouraging the deepfake detection model to learn generic and robust representations without overfitting to special artifacts. Extensive experiments demonstrate that forgery images generated with our method are of high quality and helpful to improve the performance of deepfake detection models.

Masked Conditional Diffusion Model for Enhancing Deepfake Detection

TL;DR

The paper addresses the poor cross-dataset generalization of deepfake detectors by augmenting training data with a Masked Conditional Diffusion Model (MCDM) that inpaints masked facial regions to produce diverse, high-quality forgeries. MCDM uses a random mask generator and a diffusion process conditioned on masked real images, optimizing a joint pixel- and feature-level loss to preserve semantic content while promoting robustness. Empirical results on FF++ and cross-dataset tests (CDF, DFD) show superior intra-dataset AUC and meaningful cross-dataset gains, with ablations confirming the benefits of mask-conditioning and feature reconstruction loss. This approach enhances detection robustness and generalization, offering a practical pathway to more reliable deepfake defense systems with diffusion-based augmentation.

Abstract

Recent studies on deepfake detection have achieved promising results when training and testing faces are from the same dataset. However, their results severely degrade when confronted with forged samples that the model has not yet seen during training. In this paper, deepfake data to help detect deepfakes. this paper present we put a new insight into diffusion model-based data augmentation, and propose a Masked Conditional Diffusion Model (MCDM) for enhancing deepfake detection. It generates a variety of forged faces from a masked pristine one, encouraging the deepfake detection model to learn generic and robust representations without overfitting to special artifacts. Extensive experiments demonstrate that forgery images generated with our method are of high quality and helpful to improve the performance of deepfake detection models.
Paper Structure (24 sections, 4 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 4 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Pipeline of fake sample generation. The previous method generates samples (top) by swapping the two face regions. By contrast, our method generates samples (bottom) by complementing the face region with our MCDM. Details of MCDM are described in section \ref{['section3']}.
  • Figure 2: The overview architecture of the proposed method. The input (real) images first enter the diffusion forward process. diffusion process generates noise images from the real image using the diffusion formula. The random mask $m$ generated with the random mask generator and the random noisy image are hybridized by the adaptive combine module and subsequently fed into the diffusion inverse process to get the output (fake) image. The whole system is trained by jointly minimizing the diffusion loss $L_{pixel}$ and the reconstruction loss $L_{fea}$ .
  • Figure 3: Visualization of deepfake images generated by different methods. From left to right are the real images sampled from the FF++ dataset, and corresponding deepfake images generated by DDPM ho2020denoising, IDDPM Nichol_Dhariwal_2021, ADM dhariwal2021diffusion, LDM Rombach_Blattmann_Lorenz_Esser_Ommer_2022, and our proposed method, respectively.
  • Figure 4: Grad-CAM Selvaraju_Cogswell_Das_Vedantam_Parikh_Batra_2020 visualization of different models. From top to bottom are the input images, the Grad-CAM of the model trained on "baseline" and "subset2", the Grad-CAM of the model trained on "baseline" and an additional subset, having the same scale as "subset2", generated with the ADM method, and the Grad-CAM of the model trained on "baseline" and an additional subset, having the same scale as "subset2", generated by our proposed method.