Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Zhongjie Ba; Qingyu Liu; Zhenguang Liu; Shuang Wu; Feng Lin; Li Lu; Kui Ren

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Zhongjie Ba, Qingyu Liu, Zhenguang Liu, Shuang Wu, Feng Lin, Li Lu, Kui Ren

TL;DR

A novel framework to capture broader forgery clues by extracting multiple non-overlapping local representations and fusing them into a global semantic-rich feature, and derives Local Information Loss to guarantee the orthogonality of local representations while preserving comprehensive task-relevant information.

Abstract

Deepfake technology has given rise to a spectrum of novel and compelling applications. Unfortunately, the widespread proliferation of high-fidelity fake videos has led to pervasive confusion and deception, shattering our faith that seeing is believing. One aspect that has been overlooked so far is that current deepfake detection approaches may easily fall into the trap of overfitting, focusing only on forgery clues within one or a few local regions. Moreover, existing works heavily rely on neural networks to extract forgery features, lacking theoretical constraints guaranteeing that sufficient forgery clues are extracted and superfluous features are eliminated. These deficiencies culminate in unsatisfactory accuracy and limited generalizability in real-life scenarios. In this paper, we try to tackle these challenges through three designs: (1) We present a novel framework to capture broader forgery clues by extracting multiple non-overlapping local representations and fusing them into a global semantic-rich feature. (2) Based on the information bottleneck theory, we derive Local Information Loss to guarantee the orthogonality of local representations while preserving comprehensive task-relevant information. (3) Further, to fuse the local representations and remove task-irrelevant information, we arrive at a Global Information Loss through the theoretical analysis of mutual information. Empirically, our method achieves state-of-the-art performance on five benchmark datasets.Our code is available at \url{https://github.com/QingyuLiu/Exposing-the-Deception}, hoping to inspire researchers.

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

TL;DR

Abstract

Paper Structure (26 sections, 23 equations, 7 figures, 7 tables)

This paper contains 26 sections, 23 equations, 7 figures, 7 tables.

1 Introduction
2 Related Work
3 Methodology
3.1 Overview
3.2 Local Disentanglement Module
3.3 Global Aggregation Module
4 Evaluation
4.1 Experimental Setup
4.2 Comparison with Existing Methods
4.3 Ablation Study
4.4 Visualization
4.5 Limitations
5 Conclusion
Appendix
A. Theorem Proof
...and 11 more sections

Figures (7)

Figure 1: Example visualization of four local salient features obtained by our method. Each feature focuses on distinct forgery regions with little overlap. We zoom in to show the detailed differences for these regions between a real sample and a fake sample. Our method can grasp broader forgery clues including blending ghosts, consistent and symmetrical skin tones, tooth details, and stitching seams.
Figure 2: Method overview. In the data preparation phase, we first extract frame-level facial bounding boxes from raw videos. For deepfake detection, our method consists of three modules. i) We first employ local information blocks $f_i$ to extract multiple disentangled local features $z_i$ corresponding to different forgery regions. We introduce local information loss to ensure that $z_i$ has comprehensive forgery-related information and is orthogonal to $z_j$. ii) We fuse all $z_i$ into a global feature $G$ under the guidance of a Global Information Loss. iii) Finally, $G$ is passed to the classification module to output the prediction result. We design our Local and Global Information Loss based on information bottleneck theory.
Figure 3: Information content of feature representations
Figure 4: Visual examples of our method on various types of forgery methods within FF++ (C23), i.e., Deepfakes (DF), Face2Face (FF), FaceSwap (FS) and NeuralTextures (NT). Comparison between our method with and without $\mathcal{L}_{LIL}$, Multi-Attentional, Face-x-ray, and Xception.
Figure 5: In-dataset and cross-dataset performance effects within different numbers of LIBs. We train models on FF++ (C23) with 10 epochs and test them on Celeb-DF-V2.
...and 2 more figures

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

TL;DR

Abstract

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)