Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy
Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Y. Rogov
TL;DR
This work tackles privacy in speaker recognition by enhancing universal adversarial patches (UAPs) for speaker anonymization. It introduces an Exponential TV loss to preserve imperceptibility, and a length-agnostic, tiling-based UAP generation procedure evaluated under a rigorous length-agnostic protocol. Empirical results on VoxCeleb2 show that the proposed loss improves the trade-off between fooling rate and audio quality (higher SNR and PESQ) while maintaining robust performance across varying audio lengths, outperforming prior UAP methods. The approach advances practical, real-world speaker privacy solutions by enabling durable, low-distortion anonymization across diverse utterance lengths and models.
Abstract
Deep learning voice models are commonly used nowadays, but the safety processing of personal data, such as human identity and speech content, remains suspicious. To prevent malicious user identification, speaker anonymization methods were proposed. Current methods, particularly based on universal adversarial patch (UAP) applications, have drawbacks such as significant degradation of audio quality, decreased speech recognition quality, low transferability across different voice biometrics models, and performance dependence on the input audio length. To mitigate these drawbacks, in this work, we introduce and leverage the novel Exponential Total Variance (TV) loss function and provide experimental evidence that it positively affects UAP strength and imperceptibility. Moreover, we present a novel scalable UAP insertion procedure and demonstrate its uniformly high performance for various audio lengths.
