XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

Qishan Zhang; Shuangbing Wen; Fangke Yan; Tao Hu; Jun Li

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

Qishan Zhang, Shuangbing Wen, Fangke Yan, Tao Hu, Jun Li

TL;DR

The XWSB system, which achieved SOTA performance in the SVDD challenge, is proposed, and demonstrates advanced recognition capabilities in the SVDD challenge, specifically achieving an EER of 2.32% in the CtrSVDD track.

Abstract

This paper introduces the model structure used in the SVDD 2024 Challenge. The SVDD 2024 challenge has been introduced this year for the first time. Singing voice deepfake detection (SVDD) which faces complexities due to informal speech intonations and varying speech rates. In this paper, we propose the XWSB system, which achieved SOTA per-formance in the SVDD challenge. XWSB stands for XLS-R, WavLM, and SLS Blend, representing the integration of these technologies for the purpose of SVDD. Specifically, we used the best performing model structure XLS-R&SLS from the ASVspoof DF dataset, and applied SLS to WavLM to form the WavLM&SLS structure. Finally, we integrated two models to form the XWSB system. Experimental results show that our system demonstrates advanced recognition capabilities in the SVDD challenge, specifically achieving an EER of 2.32% in the CtrSVDD track. The code and data can be found at https://github.com/QiShanZhang/XWSB_for_ SVDD2024.

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 2 figures, 3 tables)

This paper contains 13 sections, 5 equations, 2 figures, 3 tables.

Introduction
METHOD
Problem modeling
WavLM Model
XLS-R Model
SLS classifier
Model Ensemble
EXPERIMENT
Datasets and metrics
Experiment Setup
Experiment result for CtrSVDDtrack and analysis
Conclusion
ACKNOWLEDGEMENT

Figures (2)

Figure 1: Pipeline of our Proposed model. Left part: the WavLM model and the XLS-R model; Right part: the SLS Classifier.
Figure 2: Randomly selected five test set audio samples, with the learned weights of the SLS module visualized. Rows 0-4 represent the combination of WavLM and SLS, while rows 5-9 represent the combination of XLS-R and SLS.

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

TL;DR

Abstract

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

Authors

TL;DR

Abstract

Table of Contents

Figures (2)