Table of Contents
Fetching ...

An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification

Jiaqi Li, Li Wang, Liumeng Xue, Lei Wang, Zhizheng Wu

TL;DR

This work analyzes the vulnerability of automatic speaker verification (ASV) to over-the-air adversarial perturbations, highlighting how the replay channel can affect attack success. It introduces a neural replay simulator based on Wave-U-Net to approximate the replay process and integrates it into a cascade ensemble PGD framework to produce robust OTA adversarial examples. Experiments on the ASVspoof2019 dataset across four ASV architectures show that the neural replay model substantially increases OTA attack success without reducing digital attack effectiveness, underscoring security concerns for physical-access ASV deployments. The findings motivate the development of defenses against OTA perturbations and further research into generalizing replay modeling across diverse hardware and environments.

Abstract

Deep Learning has advanced Automatic Speaker Verification (ASV) in the past few years. Although it is known that deep learning-based ASV systems are vulnerable to adversarial examples in digital access, there are few studies on adversarial attacks in the context of physical access, where a replay process (i.e., over the air) is involved. An over-the-air attack involves a loudspeaker, a microphone, and a replaying environment that impacts the movement of the sound wave. Our initial experiment confirms that the replay process impacts the effectiveness of the over-the-air attack performance. This study performs an initial investigation towards utilizing a neural replay simulator to improve over-the-air adversarial attack robustness. This is achieved by using a neural waveform synthesizer to simulate the replay process when estimating the adversarial perturbations. Experiments conducted on the ASVspoof2019 dataset confirm that the neural replay simulator can considerably increase the success rates of over-the-air adversarial attacks. This raises the concern for adversarial attacks on speaker verification in physical access applications.

An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification

TL;DR

This work analyzes the vulnerability of automatic speaker verification (ASV) to over-the-air adversarial perturbations, highlighting how the replay channel can affect attack success. It introduces a neural replay simulator based on Wave-U-Net to approximate the replay process and integrates it into a cascade ensemble PGD framework to produce robust OTA adversarial examples. Experiments on the ASVspoof2019 dataset across four ASV architectures show that the neural replay model substantially increases OTA attack success without reducing digital attack effectiveness, underscoring security concerns for physical-access ASV deployments. The findings motivate the development of defenses against OTA perturbations and further research into generalizing replay modeling across diverse hardware and environments.

Abstract

Deep Learning has advanced Automatic Speaker Verification (ASV) in the past few years. Although it is known that deep learning-based ASV systems are vulnerable to adversarial examples in digital access, there are few studies on adversarial attacks in the context of physical access, where a replay process (i.e., over the air) is involved. An over-the-air attack involves a loudspeaker, a microphone, and a replaying environment that impacts the movement of the sound wave. Our initial experiment confirms that the replay process impacts the effectiveness of the over-the-air attack performance. This study performs an initial investigation towards utilizing a neural replay simulator to improve over-the-air adversarial attack robustness. This is achieved by using a neural waveform synthesizer to simulate the replay process when estimating the adversarial perturbations. Experiments conducted on the ASVspoof2019 dataset confirm that the neural replay simulator can considerably increase the success rates of over-the-air adversarial attacks. This raises the concern for adversarial attacks on speaker verification in physical access applications.
Paper Structure (12 sections, 4 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: A comparison of digital and over-the-air adversarial attacks to ASV systems.
  • Figure 2: The pipeline to synthesize robust over-the-air adversarial examples utilizing a neural replay simulator. The PGD algorithm synthesizes adversarial perturbation by attacking an ASV model with a pre-trained neural replay simulator. The perturbation is added to a bonafide utterance to become an adversarial example that is expected to remain effective after replay.
  • Figure 3: An illustration of an adversarial example cross the decision boundary to manipulate a decision result of an ASV system.
  • Figure 4: Model structure of the neural replay simulator.
  • Figure 5: A digital and over-the-air joint attack framework.