Table of Contents
Fetching ...

LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes

Zuomin Qu, Yimao Guo, Qianyue Hu, Wei Lu

TL;DR

Proactive defenses against Deepfakes are shown to be fragile; the authors propose LoRA patching, a plug-and-play, low-rank adaptation with a learnable gating mechanism and MMFA loss, to bypass these defenses while preserving realistic forgery quality. They demonstrate strong bypass performance on CelebA with minimal data and one epoch of fine-tuning, and introduce defensive LoRA patching to embed visible watermarks in outputs. The work highlights a critical security vulnerability in current defense paradigms and calls for more robust, verification-centric defenses against Deepfakes.

Abstract

Deepfakes pose significant societal risks, motivating the development of proactive defenses that embed adversarial perturbations in facial images to prevent manipulation. However, in this paper, we show that these preemptive defenses often lack robustness and reliability. We propose a novel approach, Low-Rank Adaptation (LoRA) patching, which injects a plug-and-play LoRA patch into Deepfake generators to bypass state-of-the-art defenses. A learnable gating mechanism adaptively controls the effect of the LoRA patch and prevents gradient explosions during fine-tuning. We also introduce a Multi-Modal Feature Alignment (MMFA) loss, encouraging the features of adversarial outputs to align with those of the desired outputs at the semantic level. Beyond bypassing, we present defensive LoRA patching, embedding visible warnings in the outputs as a complementary solution to mitigate this newly identified security vulnerability. With only 1,000 facial examples and a single epoch of fine-tuning, LoRA patching successfully defeats multiple proactive defenses. These results reveal a critical weakness in current paradigms and underscore the need for more robust Deepfake defense strategies. Our code is available at https://github.com/ZOMIN28/LoRA-Patching.

LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes

TL;DR

Proactive defenses against Deepfakes are shown to be fragile; the authors propose LoRA patching, a plug-and-play, low-rank adaptation with a learnable gating mechanism and MMFA loss, to bypass these defenses while preserving realistic forgery quality. They demonstrate strong bypass performance on CelebA with minimal data and one epoch of fine-tuning, and introduce defensive LoRA patching to embed visible watermarks in outputs. The work highlights a critical security vulnerability in current defense paradigms and calls for more robust, verification-centric defenses against Deepfakes.

Abstract

Deepfakes pose significant societal risks, motivating the development of proactive defenses that embed adversarial perturbations in facial images to prevent manipulation. However, in this paper, we show that these preemptive defenses often lack robustness and reliability. We propose a novel approach, Low-Rank Adaptation (LoRA) patching, which injects a plug-and-play LoRA patch into Deepfake generators to bypass state-of-the-art defenses. A learnable gating mechanism adaptively controls the effect of the LoRA patch and prevents gradient explosions during fine-tuning. We also introduce a Multi-Modal Feature Alignment (MMFA) loss, encouraging the features of adversarial outputs to align with those of the desired outputs at the semantic level. Beyond bypassing, we present defensive LoRA patching, embedding visible warnings in the outputs as a complementary solution to mitigate this newly identified security vulnerability. With only 1,000 facial examples and a single epoch of fine-tuning, LoRA patching successfully defeats multiple proactive defenses. These results reveal a critical weakness in current paradigms and underscore the need for more robust Deepfake defense strategies. Our code is available at https://github.com/ZOMIN28/LoRA-Patching.

Paper Structure

This paper contains 27 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Illustration of proactive Deepfake defenses and LoRA patching bypass. (a) Proactive defenses embed invisible adversarial perturbations into images to disrupt Deepfakes; (b) LoRA patching inserts LoRA blocks into each linear, convolutional, and transposed convolutional layer, introducing only a lightweight set of additional parameters relative to the full model, while preventing proactive defenses from disrupting manipulated images and preserving manipulation of benign ones.
  • Figure 2: Illustration of the LoRA patching fine-tuning process. A bi-level min-max optimization approach based on adversarial training is proposed for fine-tuning, where the inner maximization uses PGD to generate adversarial examples with the current patched deepfake as the target model.
  • Figure 3: Illustration of LoRA patch embedding. A pair of LoRA blocks is inserted into each convolutional and deconvolutional layer of the Deepfake model to adjust the output. Each layer further includes a learnable gating parameter that adaptively trades off the patch’s influence.
  • Figure 4: Illustration of textual descriptions generated by BLIP's vision–language encoder li2022blip for the images.
  • Figure 5: Quantitative results of different methods bypassing proactive defenses in leakage scenarios.
  • ...and 3 more figures