Table of Contents
Fetching ...

FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge

Hanzhe Li, Jiaran Zhou, Yuezun Li, Baoyuan Wu, Bin Li, Junyu Dong

TL;DR

This paper investigates the major frequency components and proposes a Frequency Parsing Network to adaptively partition frequency components related to forgery traces and describes a dedicated training strategy by leveraging the inner correlations among different frequency knowledge to instruct the learning process.

Abstract

Generating synthetic fake faces, known as pseudo-fake faces, is an effective way to improve the generalization of DeepFake detection. Existing methods typically generate these faces by blending real or fake faces in spatial domain. While these methods have shown promise, they overlook the simulation of frequency distribution in pseudo-fake faces, limiting the learning of generic forgery traces in-depth. To address this, this paper introduces {\em FreqBlender}, a new method that can generate pseudo-fake faces by blending frequency knowledge. Concretely, we investigate the major frequency components and propose a Frequency Parsing Network to adaptively partition frequency components related to forgery traces. Then we blend this frequency knowledge from fake faces into real faces to generate pseudo-fake faces. Since there is no ground truth for frequency components, we describe a dedicated training strategy by leveraging the inner correlations among different frequency knowledge to instruct the learning process. Experimental results demonstrate the effectiveness of our method in enhancing DeepFake detection, making it a potential plug-and-play strategy for other methods.

FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge

TL;DR

This paper investigates the major frequency components and proposes a Frequency Parsing Network to adaptively partition frequency components related to forgery traces and describes a dedicated training strategy by leveraging the inner correlations among different frequency knowledge to instruct the learning process.

Abstract

Generating synthetic fake faces, known as pseudo-fake faces, is an effective way to improve the generalization of DeepFake detection. Existing methods typically generate these faces by blending real or fake faces in spatial domain. While these methods have shown promise, they overlook the simulation of frequency distribution in pseudo-fake faces, limiting the learning of generic forgery traces in-depth. To address this, this paper introduces {\em FreqBlender}, a new method that can generate pseudo-fake faces by blending frequency knowledge. Concretely, we investigate the major frequency components and propose a Frequency Parsing Network to adaptively partition frequency components related to forgery traces. Then we blend this frequency knowledge from fake faces into real faces to generate pseudo-fake faces. Since there is no ground truth for frequency components, we describe a dedicated training strategy by leveraging the inner correlations among different frequency knowledge to instruct the learning process. Experimental results demonstrate the effectiveness of our method in enhancing DeepFake detection, making it a potential plug-and-play strategy for other methods.
Paper Structure (13 sections, 6 equations, 7 figures, 12 tables)

This paper contains 13 sections, 6 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Overview of our method. In contrast to the existing spatial-blending methods (right part), our method explores face blending in frequency domain (left part). By leveraging the frequency knowledge, our method can generate pseudo-fake faces that closely resemble the distribution of wild fake faces. Our method can complement and work in conjunction with existing spatial-blending methods.
  • Figure 2: Statistics of frequency distribution. The top part shows the frequency distribution of real and fake faces using algorithms in durall2019unmaskingdurall2020watch. The bottom part shows the frequency difference between real and fake. The values on the vertical axis are logarithmic with $2$.
  • Figure 3: Visualization of the frequency difference between real and fake faces. The lighter color indicates the larger difference.
  • Figure 4: Image visualization corresponding to different frequency components.
  • Figure 5: Overview of the proposed Frequency Parsing Network (FPNet). Given an input face image, our method can partition it into three frequency components, corresponding to the semantic information, structural information, and noise information respectively. Since there is no ground truth, we propose four corollaries to supervise the training. The architecture of the encoder and decoders is shown in the right part.
  • ...and 2 more figures