Table of Contents
Fetching ...

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan, Dongdong Lin, Hui Tian, Hongxia Wang, Bin Wang

Abstract

Recent advances in GAN and diffusion models have significantly improved the realism and controllability of facial deepfake manipulation, raising serious concerns regarding privacy, security, and identity misuse. Proactive defenses attempt to counter this threat by injecting adversarial perturbations into images before manipulation takes place. However, existing approaches remain limited in effectiveness due to suboptimal perturbation injection strategies and are typically designed under white-box assumptions, targeting only simple GAN-based attribute editing. These constraints hinder their applicability in practical real-world scenarios. In this paper, we propose AEGIS, the first diffusion-guided paradigm in which the AdvErsarial facial images are Generated for Identity Shielding. We observe that the limited defense capability of existing approaches stems from the peak-clipping constraint, where perturbations are forcibly truncated due to a fixed $L_\infty$-bounded. To overcome this limitation, instead of directly modifying pixels, AEGIS injects adversarial perturbations into the latent space along the DDIM denoising trajectory, thereby decoupling the perturbation magnitude from pixel-level constraints and allowing perturbations to adaptively amplify where most effective. The extensible design of AEGIS allows the defense to be expanded from purely white-box use to also support black-box scenarios through a gradient-estimation strategy. Extensive experiments across GAN and diffusion-based deepfake generators show that AEGIS consistently delivers strong defense effectiveness while maintaining high perceptual quality. In white-box settings, it achieves robust manipulation disruption, whereas in black-box settings, it demonstrates strong cross-model transferability.

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Abstract

Recent advances in GAN and diffusion models have significantly improved the realism and controllability of facial deepfake manipulation, raising serious concerns regarding privacy, security, and identity misuse. Proactive defenses attempt to counter this threat by injecting adversarial perturbations into images before manipulation takes place. However, existing approaches remain limited in effectiveness due to suboptimal perturbation injection strategies and are typically designed under white-box assumptions, targeting only simple GAN-based attribute editing. These constraints hinder their applicability in practical real-world scenarios. In this paper, we propose AEGIS, the first diffusion-guided paradigm in which the AdvErsarial facial images are Generated for Identity Shielding. We observe that the limited defense capability of existing approaches stems from the peak-clipping constraint, where perturbations are forcibly truncated due to a fixed -bounded. To overcome this limitation, instead of directly modifying pixels, AEGIS injects adversarial perturbations into the latent space along the DDIM denoising trajectory, thereby decoupling the perturbation magnitude from pixel-level constraints and allowing perturbations to adaptively amplify where most effective. The extensible design of AEGIS allows the defense to be expanded from purely white-box use to also support black-box scenarios through a gradient-estimation strategy. Extensive experiments across GAN and diffusion-based deepfake generators show that AEGIS consistently delivers strong defense effectiveness while maintaining high perceptual quality. In white-box settings, it achieves robust manipulation disruption, whereas in black-box settings, it demonstrates strong cross-model transferability.

Paper Structure

This paper contains 31 sections, 16 equations, 9 figures, 13 tables, 2 algorithms.

Figures (9)

  • Figure 1: Defense taxonomy. Passive detection identifies deepfakes post-generation, whereas proactive defense prevents manipulation. AEGIS is the first method compatible with both white-box and black-box proactive defense settings.
  • Figure 2: Overview of AEGIS. AEGIS generates adversarial facial images by injecting perturbations into the DDIM denoising trajectory. (a) Diffusion Process: the input face $x$ is forward-diffused to a noisy latent $x_{T_1}$. (b) Gradient Acquisition: obtain gradients from the deepfake model—directly in white-box, or via gradient estimation in black-box—and compute the perturbation from these gradients. (c) Adversarial Perturbation Injection: the computed perturbations are progressively added at selected DDIM denoising steps to guide reconstruction toward the final adversarial output $x^{adv}$.
  • Figure 3: Visualization of Defense effectiveness for SOTA methods and the proposed AEGIS under the white-box scenario.
  • Figure 4: Visualization of perturbation imperceptibility for SOTA methods and AEGIS under the white-box scenario.
  • Figure 5: Robustness comparison with SOTA methods in white-box settings
  • ...and 4 more figures