Table of Contents
Fetching ...

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

Oğuzhan Kurnaz, Selim Can Demirtaş, Aykut Büker, Jagabandhu Mishra, Cemal Hanilçi

TL;DR

Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering a more reliable and secure speaker verification system against spoofing attacks.

Abstract

This paper introduces the parallel network-based spoofing-aware speaker verification (SASV) system developed by BTU Speech Group for the ASVspoof5 Challenge. The SASV system integrates ASV and CM systems to enhance security against spoofing attacks. Our approach employs score and embedding fusion from ASV models (ECAPA-TDNN, WavLM) and CM models (AASIST). The fused embeddings are processed using a simple DNN structure, optimizing model performance with a combination of recently proposed a-DCF and BCE losses. We introduce a novel parallel network structure where two identical DNNs, fed with different inputs, independently process embeddings and produce SASV scores. The final SASV probability is derived by averaging these scores, enhancing robustness and accuracy. Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering a more reliable and secure speaker verification system against spoofing attacks.

Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge

TL;DR

Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering a more reliable and secure speaker verification system against spoofing attacks.

Abstract

This paper introduces the parallel network-based spoofing-aware speaker verification (SASV) system developed by BTU Speech Group for the ASVspoof5 Challenge. The SASV system integrates ASV and CM systems to enhance security against spoofing attacks. Our approach employs score and embedding fusion from ASV models (ECAPA-TDNN, WavLM) and CM models (AASIST). The fused embeddings are processed using a simple DNN structure, optimizing model performance with a combination of recently proposed a-DCF and BCE losses. We introduce a novel parallel network structure where two identical DNNs, fed with different inputs, independently process embeddings and produce SASV scores. The final SASV probability is derived by averaging these scores, enhancing robustness and accuracy. Experimental results demonstrate that the proposed parallel DNN structure outperforms traditional single DNN methods, offering a more reliable and secure speaker verification system against spoofing attacks.
Paper Structure (15 sections, 7 equations, 3 figures, 2 tables)

This paper contains 15 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Baseline Systems. Baseline 1 and Baseline 2 refer to score fusion and embedding fusion, respectively
  • Figure 2: Proposed Parallel Model
  • Figure 3: The graph illustrates the score distributions for the S1, S3, S11, and S12 systems based on class labels. S1 shows the distribution of the averaged scores obtained from the ECAPA-TDNN and AASIST models for ASV and CM. S3 depicts the distribution of prediction scores resulting from a DNN model trained with BCE loss, using embeddings extracted from the ECAPA-TDNN and AASIST models as input. S11 presents the distribution of prediction scores from our proposed parallel DNN model, which was trained using a combination of a-DCF and BCE loss functions, utilizing embeddings from the ECAPA-TDNN and AASIST models. S12 illustrates the distribution of prediction scores from our proposed parallel DNN model, trained with a combination of a-DCF and BCE loss functions, using embeddings obtained from the WavLM and AASIST models.