ELEAT-SAGA: Early & Late Integration with Evading Alternating Training for Spoof-Robust Speaker Verification
Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot
TL;DR
This work targets spoofing-robust speaker verification by introducing Score Aware Gated Attention (SAGA), which modulates speaker embeddings with countermeasure scores. It systematically explores early, late, full, and fused integration strategies, and develops alternating training regimes (ATMM) and an enhanced variant ELEAT to improve generalization to unseen attacks. The proposed ELEAT-SAGA, leveraging early CM features and a bypass mechanism, achieves state-of-the-art SASV performance on ASVspoof2019 LA (SASV-EER ≈ 1.22%) and strong results on SpoofCeleb, while reducing training time. The results demonstrate that score-based gating and carefully designed training procedures can substantially improve spoofing resilience in SASV systems, with practical implications for deployable secure biometric verification.
Abstract
Spoofing-robust automatic speaker verification (SASV) seeks to build automatic speaker verification systems that are robust against both zero-effort impostor attacks and sophisticated spoofing techniques such as voice conversion (VC) and text-to-speech (TTS). In this work, we propose a novel SASV architecture that introduces score-aware gated attention (SAGA), SASV-SAGA, enabling dynamic modulation of speaker embeddings based on countermeasure (CM) scores. By integrating speaker embeddings and CM scores from pre-trained ECAPA-TDNN and AASIST models respectively, we explore several integration strategies including early, late, and full integration. We further introduce alternating training for multi-module (ATMM) and a refined variant, evading alternating training (EAT). Experimental results on the ASVspoof 2019 Logical Access (LA) and Spoofceleb datasets demonstrate significant improvements over baselines, achieving a spoofing aware speaker verification equal error rate (SASV-EER) of 1.22% and minimum normalized agnostic detection cost function (min a-DCF) of 0.0304 on the ASVspoof 2019 evaluation set. These results confirm the effectiveness of score-aware attention mechanisms and alternating training strategies in enhancing the robustness of SASV systems.
