Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

Irad Zehavi; Roee Nitzan; Adi Shamir

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

Irad Zehavi, Roee Nitzan, Adi Shamir

TL;DR

This work shows that facial recognition systems based on Siamese networks can be covertly tampered via Weight Surgery, a simple last-layer linear transformation, to enact two backdoors: Shattered Class (misclassify pairs from a target identity) and Merged Classes (conflate two target identities). The backdoors are implemented without retraining or input-time triggers, and multiple independent backdoors can coexist with minimal cross-interference, enabling anonymity and impersonation attacks. Experimental results on FaceNet with LFW and SLLFW demonstrate high attack success rates (roughly 97–99%) and negligible benign-accuracy degradation, including scenarios with ten independent backdoors in a single model. The paper also discusses detection based on weight-rank changes and a hiding variant that preserves singular-value distributions, highlighting a realistic and practical threat to open-set facial verification systems and the importance of robust defenses against identity-targeted backdoors.

Abstract

In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

TL;DR

Abstract

of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in

of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than

). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than

Paper Structure (38 sections, 12 equations, 9 figures, 6 tables)

This paper contains 38 sections, 12 equations, 9 figures, 6 tables.

Introduction
Related Work
One-Shot Learning, Open-Set Recognition, and Facial Recognition
Fine-Tuning
Inference-Time Attacks
Backdoor Attacks
Multiple Backdoors in the Same Model
Defenses
Overview of Existing Defense Mechanisms
The Backdoors
Notation
Shattered Class
Merged Classes
Feature Spaces of Similarity Models
Projections of linear spaces
...and 23 more sections

Figures (9)

Figure 1: Class-based backdoors applied to unmodified natural images with no artificial triggers
Figure 2: Angle distribution in FaceNet's feature space for different datasets
Figure 3: MNIST 3D feature space
Figure 4: Effects of linear operations on different classes in feature space
Figure 5: LFW angle distribution in the feature space of a FaceNet model backdoored with SC
...and 4 more figures

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

TL;DR

Abstract

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons

Authors

TL;DR

Abstract

Table of Contents

Figures (9)