Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems
Karla Pizzi, Matías Pizarro, Asja Fischer
TL;DR
This work investigates whether noise-augmented training enhances adversarial robustness in ASR. Using four end-to-end SpeechBrain architectures trained under three augmentation regimes, the authors evaluate against white-box C&W and two untargeted black-box attacks, employing SI-SDR, dB$_x$, and SNR$_{seg}$ to measure perceptual distortion. The results show that noise augmentation improves both noise robustness and adversarial robustness across architectures, with seq2seq models benefiting most and transformer-based models showing moderate gains. These findings support adopting noise-aware augmentation as a practical, scalable defense to bolster the reliability and security of ASR systems in real-world environments.
Abstract
In this study, we investigate whether noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different ASR architectures, each trained under three different augmentation conditions: (1) background noise, speed variations, and reverberations; (2) speed variations only; (3) no data augmentation. We then evaluate the robustness of all resulting models against attacks with white-box or black-box adversarial examples. Our results demonstrate that noise augmentation not only enhances model performance on noisy speech but also improves the model's robustness to adversarial attacks.
