Table of Contents
Fetching ...

PuFace: Defending against Facial Cloaking Attacks for Facial Recognition Models

Jing Wen

TL;DR

PuFace is an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models.

Abstract

The recently proposed facial cloaking attacks add invisible perturbation (cloaks) to facial images to protect users from being recognized by unauthorized facial recognition models. However, we show that the "cloaks" are not robust enough and can be removed from images. This paper introduces PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. Specifically, we devise a purifier that takes all the training images including both cloaked and natural images as input and generates the purified facial images close to the manifold where natural images lie. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiment shows PuFace can effectively defend against two state-of-the-art facial cloaking attacks and reduces the attack success rate from 69.84\% to 7.61\% on average without degrading the normal accuracy for various facial recognition models. Moreover, PuFace is a model-agnostic defense mechanism that can be applied to any facial recognition model without modifying the model structure.

PuFace: Defending against Facial Cloaking Attacks for Facial Recognition Models

TL;DR

PuFace is an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models.

Abstract

The recently proposed facial cloaking attacks add invisible perturbation (cloaks) to facial images to protect users from being recognized by unauthorized facial recognition models. However, we show that the "cloaks" are not robust enough and can be removed from images. This paper introduces PuFace, an image purification system leveraging the generalization ability of neural networks to diminish the impact of cloaks by pushing the cloaked images towards the manifold of natural (uncloaked) images before the training process of facial recognition models. Specifically, we devise a purifier that takes all the training images including both cloaked and natural images as input and generates the purified facial images close to the manifold where natural images lie. To meet the defense goal, we propose to train the purifier on particularly amplified cloaked images with a loss function that combines image loss and feature loss. Our empirical experiment shows PuFace can effectively defend against two state-of-the-art facial cloaking attacks and reduces the attack success rate from 69.84\% to 7.61\% on average without degrading the normal accuracy for various facial recognition models. Moreover, PuFace is a model-agnostic defense mechanism that can be applied to any facial recognition model without modifying the model structure.
Paper Structure (44 sections, 3 equations, 4 figures, 2 tables)

This paper contains 44 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: PCA of feature vectors extracted from facial images. The user0 uses Fawkes to cloak his train images, while user1 and user2 are regular users (both training and testing images are natural). The facial recognition model trained on all training images will misidentify the (natural) test images of user0. After PuFace purifying all the training images, the cloaked images of user0 are pulled to the manifold of natural images while the natural training images of user1 and user2 still stay there.
  • Figure 2: Attacks and defenses for facial recognition model. When the model training on Aaron Eckhart's cloaked images, it misidentifies his natural images as Johnny Depp. After PuFace purifies the training images, the model trained on purified images can correctly identify Aaron Eckhart.
  • Figure 3: The reconstructed results of different defenses. Figure \ref{['fig:a']} shows the natural images, cloaked images and the cloaks between them. Following are 5 visual comparison of different defenses. In each figure, the first and third row present the reconstructed images of natural and cloaked images respectively, and the second and last row show the difference between them and the natural images.
  • Figure 4: We train PuFace on 1-10X cloaks to show how the defense performance and purified images vary.