Table of Contents
Fetching ...

Deepfake Detection via Knowledge Injection

Tonghui Li, Yuanfang Guo, Zeming Liu, Heqi Peng, Yunhong Wang

TL;DR

The paper tackles the generalization gap in deepfake detection by proposing Knowledge Injection based deepfake Detection (KID), a ViT-compatible multi-task framework that injects real-data knowledge into backbone models while learning forgery cues. It introduces an Injection Multi-Head Self-Attention (I-MSA) module to propagate authenticity information, a coarse-grained forgery localization branch to guide learning, and layer-wise suppression and contrast losses to balance real and fake knowledge. Empirical results on FF++ and multiple cross-dataset and cross-manipulation benchmarks show state-of-the-art generalization and faster convergence, with thorough ablations and qualitative analyses supporting the effectiveness of each component. The approach promises robust, real-world deepfake detection by better modeling both real and fake data distributions across unseen conditions.

Abstract

Deepfake detection technologies become vital because current generative AI models can generate realistic deepfakes, which may be utilized in malicious purposes. Existing deepfake detection methods either rely on developing classification methods to better fit the distributions of the training data, or exploiting forgery synthesis mechanisms to learn a more comprehensive forgery distribution. Unfortunately, these methods tend to overlook the essential role of real data knowledge, which limits their generalization ability in processing the unseen real and fake data. To tackle these challenges, in this paper, we propose a simple and novel approach, named Knowledge Injection based deepfake Detection (KID), by constructing a multi-task learning based knowledge injection framework, which can be easily plugged into existing ViT-based backbone models, including foundation models. Specifically, a knowledge injection module is proposed to learn and inject necessary knowledge into the backbone model, to achieve a more accurate modeling of the distributions of real and fake data. A coarse-grained forgery localization branch is constructed to learn the forgery locations in a multi-task learning manner, to enrich the learned forgery knowledge for the knowledge injection module. Two layer-wise suppression and contrast losses are proposed to emphasize the knowledge of real data in the knowledge injection module, to further balance the portions of the real and fake knowledge. Extensive experiments have demonstrated that our KID possesses excellent compatibility with different scales of Vit-based backbone models, and achieves state-of-the-art generalization performance while enhancing the training convergence speed.

Deepfake Detection via Knowledge Injection

TL;DR

The paper tackles the generalization gap in deepfake detection by proposing Knowledge Injection based deepfake Detection (KID), a ViT-compatible multi-task framework that injects real-data knowledge into backbone models while learning forgery cues. It introduces an Injection Multi-Head Self-Attention (I-MSA) module to propagate authenticity information, a coarse-grained forgery localization branch to guide learning, and layer-wise suppression and contrast losses to balance real and fake knowledge. Empirical results on FF++ and multiple cross-dataset and cross-manipulation benchmarks show state-of-the-art generalization and faster convergence, with thorough ablations and qualitative analyses supporting the effectiveness of each component. The approach promises robust, real-world deepfake detection by better modeling both real and fake data distributions across unseen conditions.

Abstract

Deepfake detection technologies become vital because current generative AI models can generate realistic deepfakes, which may be utilized in malicious purposes. Existing deepfake detection methods either rely on developing classification methods to better fit the distributions of the training data, or exploiting forgery synthesis mechanisms to learn a more comprehensive forgery distribution. Unfortunately, these methods tend to overlook the essential role of real data knowledge, which limits their generalization ability in processing the unseen real and fake data. To tackle these challenges, in this paper, we propose a simple and novel approach, named Knowledge Injection based deepfake Detection (KID), by constructing a multi-task learning based knowledge injection framework, which can be easily plugged into existing ViT-based backbone models, including foundation models. Specifically, a knowledge injection module is proposed to learn and inject necessary knowledge into the backbone model, to achieve a more accurate modeling of the distributions of real and fake data. A coarse-grained forgery localization branch is constructed to learn the forgery locations in a multi-task learning manner, to enrich the learned forgery knowledge for the knowledge injection module. Two layer-wise suppression and contrast losses are proposed to emphasize the knowledge of real data in the knowledge injection module, to further balance the portions of the real and fake knowledge. Extensive experiments have demonstrated that our KID possesses excellent compatibility with different scales of Vit-based backbone models, and achieves state-of-the-art generalization performance while enhancing the training convergence speed.

Paper Structure

This paper contains 21 sections, 11 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Biased classification boundary caused by the model's insufficient comprehension of real or fake image distribution. The vanilla method and fitting based method are usually limited to the training set, resulting in a biased classification boundary. Fake synthesis based methods have a better understanding of fake image distribution and thus establish more effective boundaries, but still lack a robust grasp of the characteristics of real images. Our proposed approach achieves a more thorough understanding of both real and fake images.
  • Figure 2: Overview of the Knowledge Injection based deepfake Detection framework.
  • Figure 3: Visualization of the authenticity correlation matrix in \ref{['eq:self-attention']}. The Patch Activation in the second row represents the average correlation between each patch and all other patches in the matrix.
  • Figure 4: Training loss and validation AUC curves of different pre-trained models after incorporating the knowledge injection framework.
  • Figure 5: The PCA feature spaces visualization of the basic ViT(a) and KID(b). The dots represent real images, the crosses represent forged images, and different colors represent different forgery methods in CDF and FF++ test sets. The circles represent the boundaries of real image features within and across domains. Best viewed in color.
  • ...and 3 more figures