MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger
TL;DR
This paper introduces MLAAD, a multilingual audio anti-spoofing dataset (version 8) with 570.3 hours of synthetic speech across 40 languages generated by 119 TTS models. By evaluating three state-of-the-art deepfake detectors, MLAAD demonstrates strong cross-dataset generalization and complements established datasets such as ASVspoof 2019, InTheWild, and Fake-Or-Real. The dataset construction involves translating English transcripts when needed, using a variety of TTS architectures, and organizing data with rich metadata to support supervised learning. The authors also publish an interactive webserver to democratize access to anti-spoofing tools, highlighting the practical impact of multilingual data in mitigating audio deepfakes globally.
Abstract
Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 8, created using 119 TTS models, comprising 58 different architectures, to generate 570.3 hours of synthetic voice in 40 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake-Or-Real when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.
