Table of Contents
Fetching ...

Contrastive Desensitization Learning for Cross Domain Face Forgery Detection

Lingyu Qiu, Ke Jiang, Xiaoyang Tan

TL;DR

The paper tackles cross-domain face forgery detection under zero-shot generalization, where unseen forgery methods cause high false alarms. It introduces the Contrastive Desensitization Network (CDN), a domain desensitization framework that learns a domain-invariant representation $Z$ by decomposing inputs into intrinsic features $I$ and domain information $D$, and by minimizing a KL-divergence-based objective between domain-perturbed latents. CDN employs a domain transformation to mix features, coupled with denoising reconstruction and a domain boundary constraint, all underpinned by a variational/inference justification that links denoising objectives to domain invariance. Extensive experiments on FF++, Celeb-DF, WildDeepfake, and DFDC show CDN achieves state-of-the-art cross-domain accuracy with substantially lower false alarm rates, while ablations verify the contribution of each component. The approach offers practical impact by delivering robust, domain-agnostic real-face representations that improve detector reliability in real-world settings, without requiring forgery examples during representation learning.

Abstract

In this paper, we propose a new cross-domain face forgery detection method that is insensitive to different and possibly unseen forgery methods while ensuring an acceptable low false positive rate. Although existing face forgery detection methods are applicable to multiple domains to some degree, they often come with a high false positive rate, which can greatly disrupt the usability of the system. To address this issue, we propose an Contrastive Desensitization Network (CDN) based on a robust desensitization algorithm, which captures the essential domain characteristics through learning them from domain transformation over pairs of genuine face images. One advantage of CDN lies in that the learnt face representation is theoretical justified with regard to the its robustness against the domain changes. Extensive experiments over large-scale benchmark datasets demonstrate that our method achieves a much lower false alarm rate with improved detection accuracy compared to several state-of-the-art methods.

Contrastive Desensitization Learning for Cross Domain Face Forgery Detection

TL;DR

The paper tackles cross-domain face forgery detection under zero-shot generalization, where unseen forgery methods cause high false alarms. It introduces the Contrastive Desensitization Network (CDN), a domain desensitization framework that learns a domain-invariant representation by decomposing inputs into intrinsic features and domain information , and by minimizing a KL-divergence-based objective between domain-perturbed latents. CDN employs a domain transformation to mix features, coupled with denoising reconstruction and a domain boundary constraint, all underpinned by a variational/inference justification that links denoising objectives to domain invariance. Extensive experiments on FF++, Celeb-DF, WildDeepfake, and DFDC show CDN achieves state-of-the-art cross-domain accuracy with substantially lower false alarm rates, while ablations verify the contribution of each component. The approach offers practical impact by delivering robust, domain-agnostic real-face representations that improve detector reliability in real-world settings, without requiring forgery examples during representation learning.

Abstract

In this paper, we propose a new cross-domain face forgery detection method that is insensitive to different and possibly unseen forgery methods while ensuring an acceptable low false positive rate. Although existing face forgery detection methods are applicable to multiple domains to some degree, they often come with a high false positive rate, which can greatly disrupt the usability of the system. To address this issue, we propose an Contrastive Desensitization Network (CDN) based on a robust desensitization algorithm, which captures the essential domain characteristics through learning them from domain transformation over pairs of genuine face images. One advantage of CDN lies in that the learnt face representation is theoretical justified with regard to the its robustness against the domain changes. Extensive experiments over large-scale benchmark datasets demonstrate that our method achieves a much lower false alarm rate with improved detection accuracy compared to several state-of-the-art methods.

Paper Structure

This paper contains 35 sections, 2 theorems, 24 equations, 10 figures, 12 tables.

Key Result

Theorem 1

The optimal solution of minimizing Eq. (eq:denoisingreconstruction_DTV) guarantees the representation is conditionally independent of the domain information, i.e., $Z\perp \!\!\! \perp D|I$.

Figures (10)

  • Figure 1: The diagram (left) of the domain shift problem, shows that the divergence between the source and target data distribution would potentially lead to a high false alarm rate. We also perform reconstruction (right) over cross-domain samples, and observe that the distribution of real face images reconstructed from the target dataset (WildDeepfake) differs significantly from those from the source domain (Celeb-DF) while having large overlapping with that of the fake face images of the same source domain (Celeb-DF).
  • Figure 2: The overall architecture of the proposed CDN for face forgery detection. To learn domain-invariant representations $Z$ from given real face images $X$. During the training phase of the CDN framework, the input image $X$ is first processed by an encoder to extract its initial representation $z$. Next, $z$ is separated into intrinsic features $I$ and domain-specific features $D$ in the latent space. A domain transformation is then applied to mix $I$ and $D$, generating a new representation $z_{\text{out}}$. Finally, $z_{\text{out}}$ is passed through a decoder to reconstruct the original image, ensuring the removal of domain-specific noise while preserving intrinsic features. Three components to ensure this objective: Intrinsic and Domain Alignment for ensuring consistency across domains while retaining intrinsic features. Denoising Reconstruction to enhance the reliability of domain-invariant representations via decoder-based reconstruction
  • Figure 3: Diagram of the domain-invariant objective.
  • Figure 4: The ROC curves of the compared intra-evaluation and cross-manipulation evaluation methods.
  • Figure 5: False Alarm Rate(FPR) ($\downarrow$) when cross-dataset testing among dataset FF++, Celeb-DF(CDF), WildDeepfake(WDF). The left two are trained on FF++, and the right two are on CDF.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Definition 1
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Remark 1