ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi
TL;DR
ASVspoof 5 tackles spoofing and deepfake threats to automatic speaker verification by introducing crowdsourced, diverse speech data and adversarial attacks, across two tracks (stand-alone CM and SASV). It provides a new MLS-based database and updated attack types, including adversarial variants, with an open data condition to broaden evaluation. The evaluation framework uses minDCF for Track 1 and a-DCF for Track 2, complemented by t-DCF and t-EER, enabling assessment of both detection and calibration. Results show substantial gains by top submissions over baselines even with non-studio data, and SSL-based features frequently improve performance, though score calibration remains a critical bottleneck for practical deployment.
Abstract
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof 5 database is built from crowdsourced data collected from a vastly greater number of speakers in diverse acoustic conditions. Attacks, also crowdsourced, are generated and tested using surrogate detection models, while adversarial attacks are incorporated for the first time. New metrics support the evaluation of spoofing-robust automatic speaker verification (SASV) as well as stand-alone detection solutions, i.e., countermeasures without ASV. We describe the two challenge tracks, the new database, the evaluation metrics, baselines, and the evaluation platform, and present a summary of the results. Attacks significantly compromise the baseline systems, while submissions bring substantial improvements.
