Table of Contents
Fetching ...

Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake

Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, Zhiqing Guo

Abstract

Active defense strategies have been developed to counter the threat of deepfake technology. However, a primary challenge is their lack of persistence, as their effectiveness is often short-lived. Attackers can bypass these defenses by simply collecting protected samples and retraining their models. This means that static defenses inevitably fail when attackers retrain their models, which severely limits practical use. We argue that an effective defense not only distorts forged content but also blocks the model's ability to adapt, which occurs when attackers retrain their models on protected images. To achieve this, we propose an innovative Two-Stage Defense Framework (TSDF). Benefiting from the intensity separation mechanism designed in this paper, the framework uses dual-function adversarial perturbations to perform two roles. First, it can directly distort the forged results. Second, it acts as a poisoning vehicle that disrupts the data preparation process essential for an attacker's retraining pipeline. By poisoning the data source, TSDF aims to prevent the attacker's model from adapting to the defensive perturbations, thus ensuring the defense remains effective long-term. Comprehensive experiments show that the performance of traditional interruption methods degrades sharply when it is subjected to adversarial retraining. However, our framework shows a strong dual defense capability, which can improve the persistence of active defense. Our code will be available at https://github.com/vpsg-research/TSDF.

Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake

Abstract

Active defense strategies have been developed to counter the threat of deepfake technology. However, a primary challenge is their lack of persistence, as their effectiveness is often short-lived. Attackers can bypass these defenses by simply collecting protected samples and retraining their models. This means that static defenses inevitably fail when attackers retrain their models, which severely limits practical use. We argue that an effective defense not only distorts forged content but also blocks the model's ability to adapt, which occurs when attackers retrain their models on protected images. To achieve this, we propose an innovative Two-Stage Defense Framework (TSDF). Benefiting from the intensity separation mechanism designed in this paper, the framework uses dual-function adversarial perturbations to perform two roles. First, it can directly distort the forged results. Second, it acts as a poisoning vehicle that disrupts the data preparation process essential for an attacker's retraining pipeline. By poisoning the data source, TSDF aims to prevent the attacker's model from adapting to the defensive perturbations, thus ensuring the defense remains effective long-term. Comprehensive experiments show that the performance of traditional interruption methods degrades sharply when it is subjected to adversarial retraining. However, our framework shows a strong dual defense capability, which can improve the persistence of active defense. Our code will be available at https://github.com/vpsg-research/TSDF.

Paper Structure

This paper contains 23 sections, 17 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of an interruption-based defense and its failure to retrain. The lower path demonstrates an effective interruption process, where protected face data leads to a distorted output from the deepfake model. The upper path illustrates a critical flaw in this interruption-only defense. Attackers can bypass the defense by retraining their model on the protected images. This adaptation makes the model immune to the interruption, ultimately causing the defense to fail.
  • Figure 2: Comparison of three active defense strategies. (a) Defense Before Deepfake Appears: this path shows a poisoning strategy applied before deepfake model training. The defender poisons the original training data by interfering with face detection. This prevents the attacker's model from learning effective facial features, causing the model creation to fail. (b) Defense After Deepfake Appears: this path shows an interruption strategy applied at model inference stage. The defender adds interruption perturbations to the face images that need to be protected. When these protected images are fed into a trained deepfake model, a distorted output is produced. (c) Two-Stage Defense: this path shows the framework proposed in this paper, which combines the previous two strategies in one perturbation. The framework applies dual protection to the original images. This allows it to both distort the forged result at the inference stage and poison the dataset in the training stage, thus improving the persistence of active defense.
  • Figure 3: The overall framework of the TSDF method. The process begins in the interruption stage (upper left), where an initial perturbation (W) is generated by feeding a perturbed image into several deepfake models and optimizing W to maximize the feature-level Mean Squared Error loss ($\mathcal{L}_{MSE}$) and minimizing a feature enhancement loss ($\mathcal{L}_{enh}$). Subsequently, the poisoning stage (lower left) generates a specialized perturbation to attack multiple face detectors by maximizing feature-level loss ($\mathcal{L}_{feat}$) and output-level loss ($\mathcal{L}_{output}$). In the combination and optimization stage (right side), the two perturbations are efficiently fused. The combined system uses a threshold $\tau$ to identify low-intensity regions in the interruption perturbations, creating a mask where the poisoning perturbation is selectively applied.
  • Figure 4: Implementation diagram of intensity separation mechanism.
  • Figure 5: Visual comparison of the interruption effect. This figure compares the outputs of deepfake models when protected by CMUA, FOUND, and TSDF.
  • ...and 2 more figures