Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Anh-Tuan Dao; Driss Matrouf; Nicholas Evans

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Anh-Tuan Dao, Driss Matrouf, Nicholas Evans

TL;DR

This paper proposes two approaches within the Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it, and Evaluated using four datasets, the speaker-invariant model reduces the average equal error rate by 17% compared to the baseline.

Abstract

Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 2 figures, 2 tables)

This paper contains 16 sections, 4 equations, 2 figures, 2 tables.

Introduction
SSL-based Spoofing Detection
Speaker-Invariant Multi-Task Framework for Spoofing Detection
Architecture
Speaker-Invariant Adversarial Training
Parameter Optimization via Backpropagation
Experimental Setup
Datasets and Metrics
Data Augmentation
Implementation Details
Results
Model Performance Comparision
Visualisation
Attack Types Breakdown Analysis
Conclusion
...and 1 more sections

Figures (2)

Figure 1: Our model architecture with dual classifiers and GRL.
Figure 2: t-SNE visualization of embeddings from MHFA, MHFA-spk and MHFA-IVspk models, showcasing audio representations of ten distinct speakers from the ASVspoof 5 dataset, each represented by a unique color.

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

TL;DR

Abstract

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (2)