Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons
Irad Zehavi, Roee Nitzan, Adi Shamir
TL;DR
This work shows that facial recognition systems based on Siamese networks can be covertly tampered via Weight Surgery, a simple last-layer linear transformation, to enact two backdoors: Shattered Class (misclassify pairs from a target identity) and Merged Classes (conflate two target identities). The backdoors are implemented without retraining or input-time triggers, and multiple independent backdoors can coexist with minimal cross-interference, enabling anonymity and impersonation attacks. Experimental results on FaceNet with LFW and SLLFW demonstrate high attack success rates (roughly 97–99%) and negligible benign-accuracy degradation, including scenarios with ten independent backdoors in a single model. The paper also discusses detection based on weight-rank changes and a hiding variant that preserves singular-value distributions, highlighting a realistic and practical threat to open-set facial verification systems and the importance of robust defenses against identity-targeted backdoors.
Abstract
In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).
