An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
Jiaqi Li, Li Wang, Liumeng Xue, Lei Wang, Zhizheng Wu
TL;DR
This work analyzes the vulnerability of automatic speaker verification (ASV) to over-the-air adversarial perturbations, highlighting how the replay channel can affect attack success. It introduces a neural replay simulator based on Wave-U-Net to approximate the replay process and integrates it into a cascade ensemble PGD framework to produce robust OTA adversarial examples. Experiments on the ASVspoof2019 dataset across four ASV architectures show that the neural replay model substantially increases OTA attack success without reducing digital attack effectiveness, underscoring security concerns for physical-access ASV deployments. The findings motivate the development of defenses against OTA perturbations and further research into generalizing replay modeling across diverse hardware and environments.
Abstract
Deep Learning has advanced Automatic Speaker Verification (ASV) in the past few years. Although it is known that deep learning-based ASV systems are vulnerable to adversarial examples in digital access, there are few studies on adversarial attacks in the context of physical access, where a replay process (i.e., over the air) is involved. An over-the-air attack involves a loudspeaker, a microphone, and a replaying environment that impacts the movement of the sound wave. Our initial experiment confirms that the replay process impacts the effectiveness of the over-the-air attack performance. This study performs an initial investigation towards utilizing a neural replay simulator to improve over-the-air adversarial attack robustness. This is achieved by using a neural waveform synthesizer to simulate the replay process when estimating the adversarial perturbations. Experiments conducted on the ASVspoof2019 dataset confirm that the neural replay simulator can considerably increase the success rates of over-the-air adversarial attacks. This raises the concern for adversarial attacks on speaker verification in physical access applications.
