SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan
TL;DR
The SVDD Challenge 2024 introduces the first dedicated evaluation plan for singing-voice deepfake detection, addressing the distinct challenges of SVDD with two realistic tracks: controlled (CtrSVDD) and in-the-wild (WildSVDD). It leverages curated datasets, including a licensing-aware CtrSVDD dataset and a substantially enlarged WildSVDD corpus, and uses $EER$ as the primary performance metric to assess robustness against unseen generators. Baseline systems built on the AASIST framework with both LFCC and raw waveform front-ends reveal generalization gaps to novel deepfake methods, motivating further research into robust SVDD models. The plan also outlines data usage rules, submission pipelines via CodaBench, and a pathway for disseminating results and descriptions at SLT 2024, facilitating shared progress and reproducibility in singing-voice deepfake detection.
Abstract
The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specialized field requiring focused attention. To promote SVDD research, we recently proposed the "SVDD Challenge," the very first research challenge focusing on SVDD for lab-controlled and in-the-wild bonafide and deepfake singing voice recordings. The challenge will be held in conjunction with the 2024 IEEE Spoken Language Technology Workshop (SLT 2024).
