Table of Contents
Fetching ...

Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes

Zhixin Xie, Jun Luo

TL;DR

SFake is a new real-time deepfake detection method that innovatively exploits deepfake models' inability to adapt to physical interference, and outperforms other detection methods with higher detection accuracy, faster process speed, and lower memory consumption.

Abstract

Real-time deepfake, a type of generative AI, is capable of "creating" non-existing contents (e.g., swapping one's face with another) in a video. It has been, very unfortunately, misused to produce deepfake videos (during web conferences, video calls, and identity authentication) for malicious purposes, including financial scams and political misinformation. Deepfake detection, as the countermeasure against deepfake, has attracted considerable attention from the academic community, yet existing works typically rely on learning passive features that may perform poorly beyond seen datasets. In this paper, we propose SFake, a new real-time deepfake detection method that innovatively exploits deepfake models' inability to adapt to physical interference. Specifically, SFake actively sends probes to trigger mechanical vibrations on the smartphone, resulting in the controllable feature on the footage. Consequently, SFake determines whether the face is swapped by deepfake based on the consistency of the facial area with the probe pattern. We implement SFake, evaluate its effectiveness on a self-built dataset, and compare it with six other detection methods. The results show that SFake outperforms other detection methods with higher detection accuracy, faster process speed, and lower memory consumption.

Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes

TL;DR

SFake is a new real-time deepfake detection method that innovatively exploits deepfake models' inability to adapt to physical interference, and outperforms other detection methods with higher detection accuracy, faster process speed, and lower memory consumption.

Abstract

Real-time deepfake, a type of generative AI, is capable of "creating" non-existing contents (e.g., swapping one's face with another) in a video. It has been, very unfortunately, misused to produce deepfake videos (during web conferences, video calls, and identity authentication) for malicious purposes, including financial scams and political misinformation. Deepfake detection, as the countermeasure against deepfake, has attracted considerable attention from the academic community, yet existing works typically rely on learning passive features that may perform poorly beyond seen datasets. In this paper, we propose SFake, a new real-time deepfake detection method that innovatively exploits deepfake models' inability to adapt to physical interference. Specifically, SFake actively sends probes to trigger mechanical vibrations on the smartphone, resulting in the controllable feature on the footage. Consequently, SFake determines whether the face is swapped by deepfake based on the consistency of the facial area with the probe pattern. We implement SFake, evaluate its effectiveness on a self-built dataset, and compare it with six other detection methods. The results show that SFake outperforms other detection methods with higher detection accuracy, faster process speed, and lower memory consumption.
Paper Structure (28 sections, 10 equations, 18 figures, 2 tables)

This paper contains 28 sections, 10 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: The overview of the SFake. The video communication software actively induces physical probes by vibrating the smartphone with certain patterns. After that, SFake analyzes the video footage and determines the authenticity of the face by checking for the consistency between the facial area and the probe pattern.
  • Figure 2: The main steps of FSA.
  • Figure 3: The four cases where the detection methods fail, with real images/videos, their corresponding fake images/videos, and their fakeness scores given by the detection model.
  • Figure 4: The intensive shake of the smartphone causes heavy distortion on the target face, which in turn causes inconsistency on the result face (the circled part in the right picture).
  • Figure 5: (a) The three-axis acceleration when playing a song at maximum volume. (b) The three-axis acceleration when playing the "vibration" sound effect. (c) The z-axis acceleration and the corresponding calculated z-axis displacement.
  • ...and 13 more figures