Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak
TL;DR
The paper tackles adversarial threats in speaker identification and aims to both detect attacks and attribute them to their victim models. It proposes a pipeline that uses a denoiser to estimate perturbations and then a classifier to identify attack type, alongside a binary detector for attack presence and a victim-model classifier. A new VoxCeleb-based dataset covers four victim models and eight attack types under $L_p$ norms (with $p \in {0,1,2,inf}$). Results show peak attack-detection AUC of 0.982, attack-type accuracy of 86.48%, and victim-model classification accuracy of 72.28%, illustrating effective detection, discrimination among attacks, and forensic attribution.
Abstract
Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectures. Additionally, we introduce a method for identifying the victim model on which the adversarial attack is carried out. To achieve this, we generate a new dataset containing multiple attacks performed against various victim models. We achieve an AUC of 0.982 for attack detection, with no more than a 0.03 drop in performance for unknown attacks. Our attack classification accuracy (excluding benign) reaches 86.48% across eight attack types using our LightResNet34 architecture, while our victim model classification accuracy reaches 72.28% across four victim models.
