Table of Contents
Fetching ...

Real is not True: Backdoor Attacks Against Deepfake Detection

Hong Sun, Ziqiang Li, Lei Liu, Bin Li

TL;DR

This study introduces a pioneering paradigm denominated as "Bad-Deepfake," which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors, and achieves an remarkable performance—a 100% attack success rate (ASR) against extensively employed deepfake detectors.

Abstract

The proliferation of malicious deepfake applications has ignited substantial public apprehension, casting a shadow of doubt upon the integrity of digital media. Despite the development of proficient deepfake detection mechanisms, they persistently demonstrate pronounced vulnerability to an array of attacks. It is noteworthy that the pre-existing repertoire of attacks predominantly comprises adversarial example attack, predominantly manifesting during the testing phase. In the present study, we introduce a pioneering paradigm denominated as Bad-Deepfake, which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors. Our approach hinges upon the strategic manipulation of a delimited subset of the training data, enabling us to wield disproportionate influence over the operational characteristics of a trained model. This manipulation leverages inherent frailties inherent to deepfake detectors, affording us the capacity to engineer triggers and judiciously select the most efficacious samples for the construction of the poisoned set. Through the synergistic amalgamation of these sophisticated techniques, we achieve an remarkable performance-a 100% attack success rate (ASR) against extensively employed deepfake detectors.

Real is not True: Backdoor Attacks Against Deepfake Detection

TL;DR

This study introduces a pioneering paradigm denominated as "Bad-Deepfake," which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors, and achieves an remarkable performance—a 100% attack success rate (ASR) against extensively employed deepfake detectors.

Abstract

The proliferation of malicious deepfake applications has ignited substantial public apprehension, casting a shadow of doubt upon the integrity of digital media. Despite the development of proficient deepfake detection mechanisms, they persistently demonstrate pronounced vulnerability to an array of attacks. It is noteworthy that the pre-existing repertoire of attacks predominantly comprises adversarial example attack, predominantly manifesting during the testing phase. In the present study, we introduce a pioneering paradigm denominated as Bad-Deepfake, which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors. Our approach hinges upon the strategic manipulation of a delimited subset of the training data, enabling us to wield disproportionate influence over the operational characteristics of a trained model. This manipulation leverages inherent frailties inherent to deepfake detectors, affording us the capacity to engineer triggers and judiciously select the most efficacious samples for the construction of the poisoned set. Through the synergistic amalgamation of these sophisticated techniques, we achieve an remarkable performance-a 100% attack success rate (ASR) against extensively employed deepfake detectors.
Paper Structure (24 sections, 2 equations, 6 figures)

This paper contains 24 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: The brief flow of poisoning-based backdoor attacks in Deepfake detection. The attacker uses the selection, construction, and poisoning steps to build a mixed training set and releases it. The user gets this training set to train a (backdoored) DNN. For the attacker, the number of poisoned samples in the mixed training set may affect the stealthiness of the attack. This study focuses on the Construction and Selection steps to improve the poisoning efficiency against Deepfake detection.
  • Figure 2: The attack success rate (ASR) of the proposed Bad-Deepfake, Blended+FUS, and the previously used Blended with random sampling on dirty-label backdoor attack against deepfake detection, where the mixing ratio $r$ indicates the proportion of the poisoned sample volume to the clean sample volume. All outcomes were calculated as the average across three separate runs.
  • Figure 3: The benign accuracy (BA) of different attacks and the clean model on dirty-label backdoor attack against Deepfake detection. All outcomes were calculated as the average across three separate runs.
  • Figure 4: The attack success rate (ASR) of the proposed Bad-Deepfake, Blended+FUS, and the previously used Blended with random sampling on clean-label backdoor attack against deepfake detection, where the mixing ratio $r$ indicates the proportion of the poisoned sample volume to the clean sample volume. All outcomes were calculated as the average across three separate runs.
  • Figure 5: Visualizations of the poisoning samples with different triggers. Compared to Blended, our method has a visual representation that is more similar to the original image.
  • ...and 1 more figures