Table of Contents
Fetching ...

The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024

Mohammadreza Molavi, Reza Khodadadi

TL;DR

An efficient and accurate pipeline for text-dependent speaker verification (TDSV) that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation is introduced.

Abstract

This paper introduces an efficient and accurate pipeline for text-dependent speaker verification (TDSV), designed to address the need for high-performance biometric systems. The proposed system incorporates a Fast-Conformer-based ASR module to validate speech content, filtering out Target-Wrong (TW) and Impostor-Wrong (IW) trials. For speaker verification, we propose a feature fusion approach that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation. This system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2), highlighting its effectiveness in balancing accuracy and robustness.

The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024

TL;DR

An efficient and accurate pipeline for text-dependent speaker verification (TDSV) that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation is introduced.

Abstract

This paper introduces an efficient and accurate pipeline for text-dependent speaker verification (TDSV), designed to address the need for high-performance biometric systems. The proposed system incorporates a Fast-Conformer-based ASR module to validate speech content, filtering out Target-Wrong (TW) and Impostor-Wrong (IW) trials. For speaker verification, we propose a feature fusion approach that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation. This system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2), highlighting its effectiveness in balancing accuracy and robustness.

Paper Structure

This paper contains 18 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of the pipeline interaction between the ASR and Speaker Verification models, including the process of filtering IW and TW trials and scoring speaker verification using cosine similarity.