Table of Contents
Fetching ...

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

TL;DR

The paper tackles adversarial threats in speaker identification and aims to both detect attacks and attribute them to their victim models. It proposes a pipeline that uses a denoiser to estimate perturbations and then a classifier to identify attack type, alongside a binary detector for attack presence and a victim-model classifier. A new VoxCeleb-based dataset covers four victim models and eight attack types under $L_p$ norms (with $p \in {0,1,2,inf}$). Results show peak attack-detection AUC of 0.982, attack-type accuracy of 86.48%, and victim-model classification accuracy of 72.28%, illustrating effective detection, discrimination among attacks, and forensic attribution.

Abstract

Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectures. Additionally, we introduce a method for identifying the victim model on which the adversarial attack is carried out. To achieve this, we generate a new dataset containing multiple attacks performed against various victim models. We achieve an AUC of 0.982 for attack detection, with no more than a 0.03 drop in performance for unknown attacks. Our attack classification accuracy (excluding benign) reaches 86.48% across eight attack types using our LightResNet34 architecture, while our victim model classification accuracy reaches 72.28% across four victim models.

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

TL;DR

The paper tackles adversarial threats in speaker identification and aims to both detect attacks and attribute them to their victim models. It proposes a pipeline that uses a denoiser to estimate perturbations and then a classifier to identify attack type, alongside a binary detector for attack presence and a victim-model classifier. A new VoxCeleb-based dataset covers four victim models and eight attack types under norms (with ). Results show peak attack-detection AUC of 0.982, attack-type accuracy of 86.48%, and victim-model classification accuracy of 72.28%, illustrating effective detection, discrimination among attacks, and forensic attribution.

Abstract

Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectures. Additionally, we introduce a method for identifying the victim model on which the adversarial attack is carried out. To achieve this, we generate a new dataset containing multiple attacks performed against various victim models. We achieve an AUC of 0.982 for attack detection, with no more than a 0.03 drop in performance for unknown attacks. Our attack classification accuracy (excluding benign) reaches 86.48% across eight attack types using our LightResNet34 architecture, while our victim model classification accuracy reaches 72.28% across four victim models.
Paper Structure (23 sections, 8 figures, 7 tables)

This paper contains 23 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Schematic of the computation of the SingleVM and the MultiVM datasets.
  • Figure 2: Pipeline of inference to predict the label linked to a given utterance $x'$ of the SingleVM or the MultiVM dataset, for the three experiments proposed.
  • Figure 3: Score distribution of the MultiVM-test data, for each system trained on all but one attack. The benign utterances are always in filled light blue, the attack that was removed from the train set is filled while the others are emptied.
  • Figure 4: Normalized confusion matrix (%) using LResNet34 as classifier for SingleVM dataset
  • Figure 5: Normalized confusion matrix (%) using ECAPA-TDNN as classifier for SingleVM dataset
  • ...and 3 more figures