Table of Contents
Fetching ...

FaceShield: Defending Facial Image against Deepfake Threats

Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim

TL;DR

FaceShield introduces a proactive, transferable defense against deepfakes by perturbing conditioning flows in diffusion models and disrupting facial feature extractors. It combines conditioned-face attacks, multi-backbone feature extractor perturbations, and an enhanced noise update mechanism with Gaussian blur and low-pass filtering to achieve imperceptible yet robust protection. The method demonstrates state-of-the-art protection against diffusion-model deepfakes, with transferability to GAN-based attacks and robustness to JPEG compression and purification techniques. Its extensibility to various deepfake pipelines and reduced computational cost make it a practical defense for real-world deployments.

Abstract

The rising use of deepfakes in criminal activities presents a significant issue, inciting widespread controversy. While numerous studies have tackled this problem, most primarily focus on deepfake detection. These reactive solutions are insufficient as a fundamental approach for crimes where authenticity is disregarded. Existing proactive defenses also have limitations, as they are effective only for deepfake models based on specific Generative Adversarial Networks (GANs), making them less applicable in light of recent advancements in diffusion-based models. In this paper, we propose a proactive defense method named FaceShield, which introduces novel defense strategies targeting deepfakes generated by Diffusion Models (DMs) and facilitates defenses on various existing GAN-based deepfake models through facial feature extractor manipulations. Our approach consists of three main components: (i) manipulating the attention mechanism of DMs to exclude protected facial features during the denoising process, (ii) targeting prominent facial feature extraction models to enhance the robustness of our adversarial perturbation, and (iii) employing Gaussian blur and low-pass filtering techniques to improve imperceptibility while enhancing robustness against JPEG compression. Experimental results on the CelebA-HQ and VGGFace2-HQ datasets demonstrate that our method achieves state-of-the-art performance against the latest deepfake models based on DMs, while also exhibiting transferability to GANs and showcasing greater imperceptibility of noise along with enhanced robustness. Code is available here: https://github.com/kuai-lab/iccv25_faceshield

FaceShield: Defending Facial Image against Deepfake Threats

TL;DR

FaceShield introduces a proactive, transferable defense against deepfakes by perturbing conditioning flows in diffusion models and disrupting facial feature extractors. It combines conditioned-face attacks, multi-backbone feature extractor perturbations, and an enhanced noise update mechanism with Gaussian blur and low-pass filtering to achieve imperceptible yet robust protection. The method demonstrates state-of-the-art protection against diffusion-model deepfakes, with transferability to GAN-based attacks and robustness to JPEG compression and purification techniques. Its extensibility to various deepfake pipelines and reduced computational cost make it a practical defense for real-world deployments.

Abstract

The rising use of deepfakes in criminal activities presents a significant issue, inciting widespread controversy. While numerous studies have tackled this problem, most primarily focus on deepfake detection. These reactive solutions are insufficient as a fundamental approach for crimes where authenticity is disregarded. Existing proactive defenses also have limitations, as they are effective only for deepfake models based on specific Generative Adversarial Networks (GANs), making them less applicable in light of recent advancements in diffusion-based models. In this paper, we propose a proactive defense method named FaceShield, which introduces novel defense strategies targeting deepfakes generated by Diffusion Models (DMs) and facilitates defenses on various existing GAN-based deepfake models through facial feature extractor manipulations. Our approach consists of three main components: (i) manipulating the attention mechanism of DMs to exclude protected facial features during the denoising process, (ii) targeting prominent facial feature extraction models to enhance the robustness of our adversarial perturbation, and (iii) employing Gaussian blur and low-pass filtering techniques to improve imperceptibility while enhancing robustness against JPEG compression. Experimental results on the CelebA-HQ and VGGFace2-HQ datasets demonstrate that our method achieves state-of-the-art performance against the latest deepfake models based on DMs, while also exhibiting transferability to GANs and showcasing greater imperceptibility of noise along with enhanced robustness. Code is available here: https://github.com/kuai-lab/iccv25_faceshield

Paper Structure

This paper contains 28 sections, 13 equations, 30 figures, 8 tables, 3 algorithms.

Figures (30)

  • Figure 1: Protecting Face during Deepfake using FaceShield. Pure images are vulnerable to face swapping, allowing the target image's face to be easily reflected. In contrast, images protected by FaceShield conceal facial feature from deepfake. Code is available here: https://github.com/kuai-lab/iccv25_faceshield
  • Figure 2: Image editing and Deepfake processes in DMs. (a) In DM-based image editing, a single image is input as a query $Q$ and edited based on a prompt condition. (b) In DM-based deepfake, two images are used, with the target image serving as the query $Q$ while the source image acts as the condition for swapping. This condition operates as key $K$ and value $V$ in the cross-attention layer.
  • Figure 3: Overview. Our method has three main parts: (i) Conditioned face attack, which disrupts feature transfer by targeting the embedding process and the attention map variance in the cross-attention layer; (ii) Facial feature extractor attack, which decreases the probability value of face detection and causes extraction disruptions, and (iii) Enhanced noise update, which improves imperceptibility by applying Gaussian blur to regions with significant intensity changes between adjacent pixels, and increases robustness against JPEG compression distortion by encoding the noise in the low-frequency domain.
  • Figure 4: Qualitative Results. Protection performance across various deepfake models when our adversarial noise is applied. Models wang2024faceye2023ip highlighted in the orange box typically exhibit facial distortions due to the influence described in Sec. \ref{['sec:diff_attack']}, while those zhao2023diffswapkim2212difffacechen2020simswapgao2021information in the blue box display newly generated faces that diverge from the source image, attributed to the impact detailed in Sec. \ref{['sec:fe_attack']}.
  • Figure 5: We generate deepfake ye2023ip results from protected images of methods liang2023adversarialliang2023mistsalman2023raisingxue2023toward. While these fail to disrupt deepfake generation, our method causes deepfakes to malfunction.
  • ...and 25 more figures