Masked Conditional Diffusion Model for Enhancing Deepfake Detection
Tiewen Chen, Shanmin Yang, Shu Hu, Zhenghan Fang, Ying Fu, Xi Wu, Xin Wang
TL;DR
The paper addresses the poor cross-dataset generalization of deepfake detectors by augmenting training data with a Masked Conditional Diffusion Model (MCDM) that inpaints masked facial regions to produce diverse, high-quality forgeries. MCDM uses a random mask generator and a diffusion process conditioned on masked real images, optimizing a joint pixel- and feature-level loss to preserve semantic content while promoting robustness. Empirical results on FF++ and cross-dataset tests (CDF, DFD) show superior intra-dataset AUC and meaningful cross-dataset gains, with ablations confirming the benefits of mask-conditioning and feature reconstruction loss. This approach enhances detection robustness and generalization, offering a practical pathway to more reliable deepfake defense systems with diffusion-based augmentation.
Abstract
Recent studies on deepfake detection have achieved promising results when training and testing faces are from the same dataset. However, their results severely degrade when confronted with forged samples that the model has not yet seen during training. In this paper, deepfake data to help detect deepfakes. this paper present we put a new insight into diffusion model-based data augmentation, and propose a Masked Conditional Diffusion Model (MCDM) for enhancing deepfake detection. It generates a variety of forged faces from a masked pristine one, encouraging the deepfake detection model to learn generic and robust representations without overfitting to special artifacts. Extensive experiments demonstrate that forgery images generated with our method are of high quality and helpful to improve the performance of deepfake detection models.
