Table of Contents
Fetching ...

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

Li Wang, Jiaqi Li, Yuhao Luo, Jiahao Zheng, Lei Wang, Hao Li, Ke Xu, Chengfang Fang, Jie Shi, Zhizheng Wu

TL;DR

AdvSV tackles the lack of a standard benchmark for over-the-air adversarial attacks in speaker verification by introducing an open, VoxCeleb1-based dataset built around OTA perturbations. The authors combine PGD-based perturbation generation with ensemble PGD and simulate OTA attacks in a controlled studio using diverse loudspeakers and microphones, providing a reproducible evaluation framework. A one-class detection baseline is proposed to assess defenses against adversarial samples, with results showing varying detection performance depending on OTA conditions. Overall, AdvSV demonstrates substantial transferability of OTA and digital attacks and offers a valuable benchmark for advancing robust defenses in speaker verification systems.

Abstract

It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC BY-SA 4.0. In addition, we also provide a detection baseline for reproducible research.

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

TL;DR

AdvSV tackles the lack of a standard benchmark for over-the-air adversarial attacks in speaker verification by introducing an open, VoxCeleb1-based dataset built around OTA perturbations. The authors combine PGD-based perturbation generation with ensemble PGD and simulate OTA attacks in a controlled studio using diverse loudspeakers and microphones, providing a reproducible evaluation framework. A one-class detection baseline is proposed to assess defenses against adversarial samples, with results showing varying detection performance depending on OTA conditions. Overall, AdvSV demonstrates substantial transferability of OTA and digital attacks and offers a valuable benchmark for advancing robust defenses in speaker verification systems.

Abstract

It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC BY-SA 4.0. In addition, we also provide a detection baseline for reproducible research.
Paper Structure (12 sections, 1 equation, 2 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 1 equation, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Illustration of an over-the-air adversarial attack, consisting of (a) perturbation generation and (b) over-the-air attack steps.
  • Figure 2: Framework of the baseline system to detect adversarial attacks. '-' means subtracting re-synthesized spectrogram from the original spectrogram.