Table of Contents
Fetching ...

Towards Imperceptible Adversarial Defense: A Gradient-Driven Shield against Facial Manipulations

Yue Li, Linying Xue, Dongdong Lin, Qiushi Li, Hui Tian, Hongxia Wang

TL;DR

This work introduces GRASP, a gradient-projection-based proactive defense against facial deepfakes that simultaneously disrupts forgery while preserving perceptual fidelity. It unifies defense-focused $L_{ ext{MSE}}$, perceptual $L_{ ext{SSIM}}$, and low-frequency $L_{ ext{LF}}$ losses, and resolves gradient conflicts via a cross-gradient projection scheme with Gaussian smoothing. The approach achieves near-ideal defense performance (e.g., $ ext{DSR}$ around 100% and $\text{PSNR} \approx 40$ dB) across multiple deepfake models and datasets, while delivering high visual quality and robustness under post-processing. These results demonstrate GRASP's strong generalizability and practical impact for safeguarding identities and attributes in manipulated media, with potential extension to video-domain deepfakes and evolving generative architectures.

Abstract

With the flourishing prosperity of generative models, manipulated facial images have become increasingly accessible, raising concerns regarding privacy infringement and societal trust. In response, proactive defense strategies embed adversarial perturbations into facial images to counter deepfake manipulation. However, existing methods often face a tradeoff between imperceptibility and defense effectiveness-strong perturbations may disrupt forgeries but degrade visual fidelity. Recent studies have attempted to address this issue by introducing additional visual loss constraints, yet often overlook the underlying gradient conflicts among losses, ultimately weakening defense performance. To bridge the gap, we propose a gradient-projection-based adversarial proactive defense (GRASP) method that effectively counters facial deepfakes while minimizing perceptual degradation. GRASP is the first approach to successfully integrate both structural similarity loss and low-frequency loss to enhance perturbation imperceptibility. By analyzing gradient conflicts between defense effectiveness loss and visual quality losses, GRASP pioneers the design of the gradient-projection mechanism to mitigate these conflicts, enabling balanced optimization that preserves image fidelity without sacrificing defensive performance. Extensive experiments validate the efficacy of GRASP, achieving a PSNR exceeding 40 dB, SSIM of 0.99, and a 100% defense success rate against facial attribute manipulations, significantly outperforming existing approaches in visual quality.

Towards Imperceptible Adversarial Defense: A Gradient-Driven Shield against Facial Manipulations

TL;DR

This work introduces GRASP, a gradient-projection-based proactive defense against facial deepfakes that simultaneously disrupts forgery while preserving perceptual fidelity. It unifies defense-focused , perceptual , and low-frequency losses, and resolves gradient conflicts via a cross-gradient projection scheme with Gaussian smoothing. The approach achieves near-ideal defense performance (e.g., around 100% and dB) across multiple deepfake models and datasets, while delivering high visual quality and robustness under post-processing. These results demonstrate GRASP's strong generalizability and practical impact for safeguarding identities and attributes in manipulated media, with potential extension to video-domain deepfakes and evolving generative architectures.

Abstract

With the flourishing prosperity of generative models, manipulated facial images have become increasingly accessible, raising concerns regarding privacy infringement and societal trust. In response, proactive defense strategies embed adversarial perturbations into facial images to counter deepfake manipulation. However, existing methods often face a tradeoff between imperceptibility and defense effectiveness-strong perturbations may disrupt forgeries but degrade visual fidelity. Recent studies have attempted to address this issue by introducing additional visual loss constraints, yet often overlook the underlying gradient conflicts among losses, ultimately weakening defense performance. To bridge the gap, we propose a gradient-projection-based adversarial proactive defense (GRASP) method that effectively counters facial deepfakes while minimizing perceptual degradation. GRASP is the first approach to successfully integrate both structural similarity loss and low-frequency loss to enhance perturbation imperceptibility. By analyzing gradient conflicts between defense effectiveness loss and visual quality losses, GRASP pioneers the design of the gradient-projection mechanism to mitigate these conflicts, enabling balanced optimization that preserves image fidelity without sacrificing defensive performance. Extensive experiments validate the efficacy of GRASP, achieving a PSNR exceeding 40 dB, SSIM of 0.99, and a 100% defense success rate against facial attribute manipulations, significantly outperforming existing approaches in visual quality.

Paper Structure

This paper contains 24 sections, 17 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Diagrams of passive detection and proactive defense. (a) Passive Detection: A detector is employed to determine whether an image is forged. (b) Proactive Defense: Perturbations are added to the original image to disrupt the forgery process of deepfake models.
  • Figure 2: Overview of GRASP: The proposed method enhances the MSE loss between the outputs of the forgery model when given original and adversarial facial images as input, while simultaneously minimizing the SSIM loss and low-frequency loss between the original and adversarial images. Gradient projection is employed to migrate gradient conflicts. Low-Frequency Constraint denotes the construction of the low-frequency loss, while Adversarial Image Generation illustrates the process of crafting adversarial facial images.
  • Figure 3: Gradient projection strategy to resolve gradient conflicts. (a) The gradient directions $g_i$ and $g_j$ are in conflict. (b) To resolve the conflict, $g_i$ and $g_j$ are mutually projected onto each other's normal planes. (c) Conflict-free gradients after projection are indicated.
  • Figure 4: The figure presents the experimental results of GRASP across different models and datasets. Subfigures (a)-(c) illustrate the performance of GRASP on the StarGAN model using the CelebA, FFHQ, and LFW datasets, evaluated under varying numbers of input images (50, 100, 500, 1000, and the overall average) with respect to the DSR, PSNR, and LF metrics. Subfigures (d)-(f) report the corresponding results for the AttGAN model under the same settings, while subfigures (g)-(i) present the outcomes for the HiSD model.
  • Figure 5: Visualization examples of disrupting attribute editing. For each target model, the first row shows the deepfake model's forgery results on the original images, while the second row displays the deepfake model's output on the adversarial facial images.
  • ...and 3 more figures