Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Kai Tan; Lin Zhang; Ruiteng Zhang; Johan Rohdin; Leibny Paola García-Perera; Zexin Cai; Sanjeev Khudanpur; Matthew Wiesner; Nicholas Andrews

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Kai Tan, Lin Zhang, Ruiteng Zhang, Johan Rohdin, Leibny Paola García-Perera, Zexin Cai, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

Abstract

Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Abstract

Paper Structure (25 sections, 9 equations, 4 figures, 4 tables)

This paper contains 25 sections, 9 equations, 4 figures, 4 tables.

Introduction
Proposed method
Definition of SASV
Cross-encoder SASV with three-class objective
Construct hard pairs for training
Embedding extractor
Aggregation via cross-attention
Three-class objective
From three-class to binary-decision via LLR
Reformulate SASV as three-class classification
Convert three-class prediction to binary-decision
Experimental Setup
Database and training trials
Metrics
Configuration
...and 10 more sections

Figures (4)

Figure 1: Comparison among (a) fusion-based system, (b) bi-encoder based existing integrated SASV, and (c) proposed cross-encoder based integerating SASV via three-class formulation and LLR.
Figure 2: Comparison of score distributions between the B04 baseline and the proposed 3T2-SASV on SpoofCeleb.
Figure 3: Class-conditional score distribution of the integrated B04 method and the proposed method on ASVSpoof5.
Figure 4: Attack-wise score distribution of the spoof category in the proposed method on ASVspoof5.

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Abstract

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Authors

Abstract

Table of Contents

Figures (4)