Table of Contents
Fetching ...

Decoupling Forgery Semantics for Generalizable Deepfake Detection

Wei Ye, Xinan He, Feng Ding

TL;DR

The paper tackles the generalization gap in DeepFake detection by decoupling forgery semantics into common Fc and unique Fu components, plus irrelevant content semantics. It introduces an adaptive high-pass filter and two modules, MHFE and MHFF, to encourage independence of decoupled semantics across a two-stage training process: Stage 1 extracts all forgery semantics from high-frequency cues, while Stage 2 refines Fc and Fu and trains detectors for each. Inference relies on Fc to classify real vs fake, aiming for robust cross-domain performance, which is demonstrated via intra-domain and cross-domain evaluations on FF++, Celeb-DF, DFD, and DFDC datasets. Ablation and visualization analyses corroborate the contributions of the high-frequency processing and semantic disentanglement, suggesting practical improvements for generalizable deepfake detection and insights into AI-generated content discernment.

Abstract

In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for DeepFake detectors. For our proposed method, after decoupling, the common forgery semantics could be extracted from DeepFakes, and subsequently be employed for developing the generalizability of DeepFake detectors. Also, to pursue additional generalizability, we designed an adaptive high-pass module and a two-stage training strategy to improve the independence of decoupled semantics. Evaluation on FF++, Celeb-DF, DFD, and DFDC datasets showcases our method's excellent detection and generalization performance. Code is available at: https://github.com/leaffeall/DFS-GDD.

Decoupling Forgery Semantics for Generalizable Deepfake Detection

TL;DR

The paper tackles the generalization gap in DeepFake detection by decoupling forgery semantics into common Fc and unique Fu components, plus irrelevant content semantics. It introduces an adaptive high-pass filter and two modules, MHFE and MHFF, to encourage independence of decoupled semantics across a two-stage training process: Stage 1 extracts all forgery semantics from high-frequency cues, while Stage 2 refines Fc and Fu and trains detectors for each. Inference relies on Fc to classify real vs fake, aiming for robust cross-domain performance, which is demonstrated via intra-domain and cross-domain evaluations on FF++, Celeb-DF, DFD, and DFDC datasets. Ablation and visualization analyses corroborate the contributions of the high-frequency processing and semantic disentanglement, suggesting practical improvements for generalizable deepfake detection and insights into AI-generated content discernment.

Abstract

In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for DeepFake detectors. For our proposed method, after decoupling, the common forgery semantics could be extracted from DeepFakes, and subsequently be employed for developing the generalizability of DeepFake detectors. Also, to pursue additional generalizability, we designed an adaptive high-pass module and a two-stage training strategy to improve the independence of decoupled semantics. Evaluation on FF++, Celeb-DF, DFD, and DFDC datasets showcases our method's excellent detection and generalization performance. Code is available at: https://github.com/leaffeall/DFS-GDD.
Paper Structure (14 sections, 16 equations, 5 figures, 3 tables)

This paper contains 14 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of Our Method with Existing Techniques. Orange represents semantics enhancing generalization, while grey indicates hindering generalization.
  • Figure 2: The overview of our method. In the entangled semantics, dark gray represents irrelevant content semantics $C$, light gray represents unique forgery semantics $Fu$, and orange represents common forgery semantics $Fc$. $Encoder_2$ utilizes branches to extract all forgery semantics from $Encoder_1$. Both $Decoder_1$ and $Decoder_2$ include processes for self-reconstruction and cross-reconstruction.
  • Figure 3: Visualization of Grad-CAM for Xception, UCF, and our approach across intra-domain (FF++) and cross-domain datasets (Celeb-DF, DFD, and DFDC).
  • Figure 4: The architecture details of $Encoder_1$ and $Decoder_1$ in our proposed method. (Left) In $Encoder_1$, $Extraction_1$ and $Extraction_2$ constitute a multi-scale high-frequency feature extraction module (MHFE), while $Fusion_1$ and $Fusion_2$ form a multi-scale high-frequency feature fusion module (MHFF). (Right) In $Decoder_1$, convolutional layers and Swiftformer are used to reconstruct images.
  • Figure 5: The architecture details of $Encoder_2$ and $Decoder_2$ in our proposed method. (Left) In $Encoder_2$ of training stage 2, a branch first utilizes $Encoder_1$ to extract all forgery semantics for extracting common forgery semantics. (Right) In $Decoder_2$, both branches solely employ convolutional layers to reconstruct forgery semantics.