Table of Contents
Fetching ...

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Kai Tan, Lin Zhang, Ruiteng Zhang, Johan Rohdin, Leibny Paola García-Perera, Zexin Cai, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

Abstract

Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Abstract

Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.
Paper Structure (25 sections, 9 equations, 4 figures, 4 tables)

This paper contains 25 sections, 9 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison among (a) fusion-based system, (b) bi-encoder based existing integrated SASV, and (c) proposed cross-encoder based integerating SASV via three-class formulation and LLR.
  • Figure 2: Comparison of score distributions between the B04 baseline and the proposed 3T2-SASV on SpoofCeleb.
  • Figure 3: Class-conditional score distribution of the integrated B04 method and the proposed method on ASVSpoof5.
  • Figure 4: Attack-wise score distribution of the spoof category in the proposed method on ASVspoof5.