Frequency Masking for Universal Deepfake Detection
Chandler Timm Doloriel, Ngai-Man Cheung
TL;DR
This work tackles universal deepfake detection by enforcing generalization to unseen generative AI methods. It introduces masked image modeling in a supervised setting, applying masks in both spatial and frequency domains during training to promote robust feature learning, with no masking applied at test time. Empirical results show that frequency-domain masking delivers superior generalization, achieving a peak mAP of 88.22% at a 15% masking ratio and consistently boosting performance when combined with existing SOTA methods across GANs and diffusion models. The findings highlight frequency-domain artifacts as durable cues for deepfake detection and present a practical, training-time regularization strategy to improve cross-model robustness.
Abstract
We study universal deepfake detection. Our goal is to detect synthetic images from a range of generative AI approaches, particularly from emerging ones which are unseen during training of the deepfake detector. Universal deepfake detection requires outstanding generalization capability. Motivated by recently proposed masked image modeling which has demonstrated excellent generalization in self-supervised pre-training, we make the first attempt to explore masked image modeling for universal deepfake detection. We study spatial and frequency domain masking in training deepfake detectors. Based on empirical analysis, we propose a novel deepfake detector via frequency masking. Our focus on frequency domain is different from the majority, which primarily target spatial domain detection. Our comparative analyses reveal substantial performance gains over existing methods. Code and models are publicly available.
