Certification of Speaker Recognition Models to Additive Perturbations
Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets
TL;DR
This work tackles provable robustness for speaker recognition under additive perturbations by transferring randomized smoothing from the image domain to audio embeddings. It formulates a few-shot enrollment setting with embedding centroids and derives a certification guarantee for the smoothed embedding g(x) that bounds perturbations by a radius R(φ, σ). The authors implement a practical pipeline using sample-based estimates and Hoeffding bounds, achieving state-of-the-art certified accuracy on VoxCeleb1/2 and comparing favorably to prior certified methods in few-shot settings. The results demonstrate the potential of certified robustness for voice biometrics, with implications for secure access and privacy-preserving speech technologies.
Abstract
Speaker recognition technology is applied to various tasks, from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, initially developed for the image domain. Our work covers this gap by transferring and improving randomized smoothing certification techniques against norm-bounded additive perturbations for classification and few-shot learning tasks to speaker recognition. We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models. We expect this work to improve the robustness of voice biometrics and accelerate the research of certification methods in the audio domain.
