GM-DF: Generalized Multi-Scenario Deepfake Detection
Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen
TL;DR
This work tackles the problem of generalization in deepfake detection across diverse datasets. It introduces GM-DF, a Generalized Multi-Scenario Deepfake Detection framework that combines domain-specific feature extraction via a hybrid expert MoE, CLIP-based common feature alignment, and a masked image modeling head, all trained under a domain-aware meta-learning objective with a domain-alignment loss. The approach yields state-of-the-art performance on both traditional single-domain protocols and a newly proposed multi-domain benchmark, demonstrating strong cross-domain generalization and robustness to distortions. The results suggest a viable path toward unified forgery detectors capable of operating across varied real-world scenarios and datasets, with potential implications for scalable forgery foundation models.
Abstract
Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach.
