Table of Contents
Fetching ...

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Yuhao Sun, Lingyun Yu, Hongtao Xie, Jiaming Li, Yongdong Zhang

TL;DR

DiffAM introduces a diffusion-based framework for facial privacy protection by transferring adversarial makeup from a reference image. It splits the task into text-guided makeup removal to establish deterministic cross-domain guidance in CLIP space, and image-guided adversarial makeup transfer guided by a CLIP-based makeup loss and an ensemble attack for robust black-box transferability. The method achieves high visual fidelity and superior attack success rates against multiple FR models and commercial APIs, outperforming prior noise- and makeup-based approaches. By leveraging DDIM inversion and carefully designed directional and pixel-level losses, DiffAM preserves makeup-irrelevant details while delivering precise makeup transfer, enabling practical privacy protection in real-world scenarios.

Abstract

With the rapid development of face recognition (FR) systems, the privacy of face images on social media is facing severe challenges due to the abuse of unauthorized FR systems. Some studies utilize adversarial attack techniques to defend against malicious FR systems by generating adversarial examples. However, the generated adversarial examples, i.e., the protected face images, tend to suffer from subpar visual quality and low transferability. In this paper, we propose a novel face protection approach, dubbed DiffAM, which leverages the powerful generative ability of diffusion models to generate high-quality protected face images with adversarial makeup transferred from reference images. To be specific, we first introduce a makeup removal module to generate non-makeup images utilizing a fine-tuned diffusion model with guidance of textual prompts in CLIP space. As the inverse process of makeup transfer, makeup removal can make it easier to establish the deterministic relationship between makeup domain and non-makeup domain regardless of elaborate text prompts. Then, with this relationship, a CLIP-based makeup loss along with an ensemble attack strategy is introduced to jointly guide the direction of adversarial makeup domain, achieving the generation of protected face images with natural-looking makeup and high black-box transferability. Extensive experiments demonstrate that DiffAM achieves higher visual quality and attack success rates with a gain of 12.98% under black-box setting compared with the state of the arts. The code will be available at https://github.com/HansSunY/DiffAM.

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

TL;DR

DiffAM introduces a diffusion-based framework for facial privacy protection by transferring adversarial makeup from a reference image. It splits the task into text-guided makeup removal to establish deterministic cross-domain guidance in CLIP space, and image-guided adversarial makeup transfer guided by a CLIP-based makeup loss and an ensemble attack for robust black-box transferability. The method achieves high visual fidelity and superior attack success rates against multiple FR models and commercial APIs, outperforming prior noise- and makeup-based approaches. By leveraging DDIM inversion and carefully designed directional and pixel-level losses, DiffAM preserves makeup-irrelevant details while delivering precise makeup transfer, enabling practical privacy protection in real-world scenarios.

Abstract

With the rapid development of face recognition (FR) systems, the privacy of face images on social media is facing severe challenges due to the abuse of unauthorized FR systems. Some studies utilize adversarial attack techniques to defend against malicious FR systems by generating adversarial examples. However, the generated adversarial examples, i.e., the protected face images, tend to suffer from subpar visual quality and low transferability. In this paper, we propose a novel face protection approach, dubbed DiffAM, which leverages the powerful generative ability of diffusion models to generate high-quality protected face images with adversarial makeup transferred from reference images. To be specific, we first introduce a makeup removal module to generate non-makeup images utilizing a fine-tuned diffusion model with guidance of textual prompts in CLIP space. As the inverse process of makeup transfer, makeup removal can make it easier to establish the deterministic relationship between makeup domain and non-makeup domain regardless of elaborate text prompts. Then, with this relationship, a CLIP-based makeup loss along with an ensemble attack strategy is introduced to jointly guide the direction of adversarial makeup domain, achieving the generation of protected face images with natural-looking makeup and high black-box transferability. Extensive experiments demonstrate that DiffAM achieves higher visual quality and attack success rates with a gain of 12.98% under black-box setting compared with the state of the arts. The code will be available at https://github.com/HansSunY/DiffAM.
Paper Structure (20 sections, 21 equations, 13 figures, 3 tables)

This paper contains 20 sections, 21 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Core idea comparison. Text-guided method generates adversarial makeup simply with a pair of textual prompts. The coarse-grained guidance of text results in unexpected makeup generation (as shown in red boxes). Our method introduces a makeup removal module to transition this task from text-based guidance to image-based guidance and controls the direction and distance of refined adversarial makeup generation (as shown in green boxes).
  • Figure 2: Overview of DiffAM. DiffAM is a two-stage framework that generates protected face image $x^\prime$ by transferring the makeup style of $y$ to $x$. Specifically, in text-guided makeup removal module, we input a reference image $y$ and obtain the non-makeup $\hat{y}$ through text guidance, determining the precise makeup direction. Then, in image-guided adversarial makeup transfer module, we input a face image $x$ and obtain the adversarial-makeup image $x^\prime$ through image guidance of $y$ and $\hat{y}$, along with an ensemble attack strategy.
  • Figure 3: The process of adversarial makeup transfer in CLIP space. (a) It is challenging to directly find a precise path from the non-makeup domain to the adversarial makeup domain. (b) The process of text-guided makeup removal can help establish the relationship between domains. (c) The inverse direction of makeup removal indicates the direction to makeup domain for makeup transfer and the pixel-level makeup loss guides the distance to makeup domain. (d) The direction of ensemble attack and makeup transfer jointly guide the final direction to the adversarial makeup domain.
  • Figure 4: Visualizations of the protected face images generated by different facial privacy protection methods on CelebA-HQ. The green and blue numbers below each image are confidence scores returned by Face++ and Aliyun.
  • Figure 5: The confidence scores (higher is better) returned from commercial APIs, Face++ and Aliyun. DiffAM has higher and more stable confidence scores than state-of-the-art noise-based and makeup-based facial privacy protection methods.
  • ...and 8 more figures