Table of Contents
Fetching ...

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

Jikang Cheng, Renye Yan, Zhiyuan Yan, Yaozhong Gan, Xueyi Zhang, Zhongyuan Wang, Wei Peng, Ling Liang

TL;DR

The paper introduces Multi-In-Domain Face Forgery Detection (MID-FFD), a realistic setting where detectors must judge real vs. fake on frame-by-frame inputs from diverse, unseen domains. It proposes DevDet, a two-stage, model-agnostic framework that first amplifies forgery cues via a Face Forgery Developer (FFDev) and then adapts detectors with Dose-Adaptive Fine-Tuning (DAFT) using a DoseDict to maintain generalization. Empirical results across wide benchmarks show improved real/fake discrimination in domain-unspecified scenarios while preserving cross-domain performance, addressing key limitations of prior generalizable and incremental approaches. The work provides a new paradigm for practical deepfake detection and offers actionable components (FFDev, DAFT, DoseDict) that can be plugged into existing detectors. Overall, DevDet demonstrates stronger MID-FFD performance and robustness, suggesting a viable path toward reliable real-world deployment.

Abstract

Existing methods for deepfake detection aim to develop generalizable detectors. Although "generalizable" is the ultimate target once and for all, with limited training forgeries and domains, it appears idealistic to expect generalization that covers entirely unseen variations, especially given the diversity of real-world deepfakes. Therefore, introducing large-scale multi-domain data for training can be feasible and important for real-world applications. However, within such a multi-domain scenario, the differences between multiple domains, rather than the subtle real/fake distinctions, dominate the feature space. As a result, despite detectors being able to relatively separate real and fake within each domain (i.e., high AUC), they struggle with single-image real/fake judgments in domain-unspecified conditions (i.e., low ACC). In this paper, we first define a new research paradigm named Multi-In-Domain Face Forgery Detection (MID-FFD), which includes sufficient volumes of real-fake domains for training. Then, the detector should provide definitive real-fake judgments to the domain-unspecified inputs, which simulate the frame-by-frame independent detection scenario in the real world. Meanwhile, to address the domain-dominant issue, we propose a model-agnostic framework termed DevDet (Developer for Detector) to amplify real/fake differences and make them dominant in the feature space. DevDet consists of a Face Forgery Developer (FFDev) and a Dose-Adaptive detector Fine-Tuning strategy (DAFT). Experiments demonstrate our superiority in predicting real-fake under the MID-FFD scenario while maintaining original generalization ability to unseen data.

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

TL;DR

The paper introduces Multi-In-Domain Face Forgery Detection (MID-FFD), a realistic setting where detectors must judge real vs. fake on frame-by-frame inputs from diverse, unseen domains. It proposes DevDet, a two-stage, model-agnostic framework that first amplifies forgery cues via a Face Forgery Developer (FFDev) and then adapts detectors with Dose-Adaptive Fine-Tuning (DAFT) using a DoseDict to maintain generalization. Empirical results across wide benchmarks show improved real/fake discrimination in domain-unspecified scenarios while preserving cross-domain performance, addressing key limitations of prior generalizable and incremental approaches. The work provides a new paradigm for practical deepfake detection and offers actionable components (FFDev, DAFT, DoseDict) that can be plugged into existing detectors. Overall, DevDet demonstrates stronger MID-FFD performance and robustness, suggesting a viable path toward reliable real-world deployment.

Abstract

Existing methods for deepfake detection aim to develop generalizable detectors. Although "generalizable" is the ultimate target once and for all, with limited training forgeries and domains, it appears idealistic to expect generalization that covers entirely unseen variations, especially given the diversity of real-world deepfakes. Therefore, introducing large-scale multi-domain data for training can be feasible and important for real-world applications. However, within such a multi-domain scenario, the differences between multiple domains, rather than the subtle real/fake distinctions, dominate the feature space. As a result, despite detectors being able to relatively separate real and fake within each domain (i.e., high AUC), they struggle with single-image real/fake judgments in domain-unspecified conditions (i.e., low ACC). In this paper, we first define a new research paradigm named Multi-In-Domain Face Forgery Detection (MID-FFD), which includes sufficient volumes of real-fake domains for training. Then, the detector should provide definitive real-fake judgments to the domain-unspecified inputs, which simulate the frame-by-frame independent detection scenario in the real world. Meanwhile, to address the domain-dominant issue, we propose a model-agnostic framework termed DevDet (Developer for Detector) to amplify real/fake differences and make them dominant in the feature space. DevDet consists of a Face Forgery Developer (FFDev) and a Dose-Adaptive detector Fine-Tuning strategy (DAFT). Experiments demonstrate our superiority in predicting real-fake under the MID-FFD scenario while maintaining original generalization ability to unseen data.

Paper Structure

This paper contains 28 sections, 16 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: MID-FFD train on data with multiple domains and test on domain-unspecified inputs frame by frame with independent definitive real-fake judgment (i.e., ACC). Please refer to Fig. \ref{['fig:first-dist']} for the challenge of MID-FFD.
  • Figure 2: t-SNE visualization of detectors trained with two domains (D1: FF++ FF++, D2: WDF wilddeepfake). Real and Fake within each specific domain are relatively well-divided, which is demonstrated by their promising in-domain AUC. However, in real-world applications, the domain-unspecified test inputs cannot be directly judged as real or fake when they fall within the gap of the varied decision boundaries of D1 and D2, which is caused by the dominance of domain distinction over real/fake distinction in the feature space. Further visualization results could be found in Fig. \ref{['fig:exp_dist']}.
  • Figure 3: The two-stage architecture of the proposed DevDet.
  • Figure 4: T-SNE tsne visualization of feature space. Here, Effnb4 is used as the base model that aligned with Tab. \ref{['tab:main']}. The black dotted lines are the instruction lines of the possible decision boundary of each specified domain. MID Base has multiple distinct decision boundary across different domains, leading to the poor ACC when the input is domain-unspecified. Our result holds a consistent boundary for definitive real-fake judgment. Zoom in for better illustrations.
  • Figure 5: Grad-CAM gradcam visualization of the saliency map that is associated with classifying as fake. We show two datasets and two conditions of Maintain Easy and Enhance Hard.
  • ...and 1 more figures