Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Tianchi Liu; Ivan Kukanov; Zihan Pan; Qiongqiong Wang; Hardik B. Sailor; Kong Aik Lee

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Tianchi Liu, Ivan Kukanov, Zihan Pan, Qiongqiong Wang, Hardik B. Sailor, Kong Aik Lee

TL;DR

This work evaluates top-performing speech anti-spoofing systems that are trained on English data but tested on other languages, observing notable performance declines and proposes an innovative approach - Accent-based data expansion via TTS (ACCENT), which introduces diverse linguistic knowledge to monolingual-trained models, improving their cross-lingual capabilities.

Abstract

The effects of language mismatch impact speech anti-spoofing systems, while investigations and quantification of these effects remain limited. Existing anti-spoofing datasets are mainly in English, and the high cost of acquiring multilingual datasets hinders training language-independent models. We initiate this work by evaluating top-performing speech anti-spoofing systems that are trained on English data but tested on other languages, observing notable performance declines. We propose an innovative approach - Accent-based data expansion via TTS (ACCENT), which introduces diverse linguistic knowledge to monolingual-trained models, improving their cross-lingual capabilities. We conduct experiments on a large-scale dataset consisting of over 3 million samples, including 1.8 million training samples and nearly 1.2 million testing samples across 12 languages. The language mismatch effects are preliminarily quantified and remarkably reduced over 15% by applying the proposed ACCENT. This easily implementable method shows promise for multilingual and low-resource language scenarios.

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

TL;DR

Abstract

Paper Structure (17 sections, 3 figures, 6 tables)

This paper contains 17 sections, 3 figures, 6 tables.

Introduction
Methodology
Validate the Language Mismatch Effect
Motivations and Hypothesis for ACCENT
Methodology of Creating Dataset using ACCENT
Experimental Setup
Dataset
Training Set
Test Sets
Training Strategy
Results and Discussion
Effects of Language Mismatch on SOTA Models
Evaluation for the proposed ACCENT method
Evaluation on Synthetic Singing Test Set
Quantifying Language Mismatch Effects
...and 2 more sections

Figures (3)

Figure 1: Illustration of language mismatch effects on speech anti-spoofing systems. The English data-trained system works well with English data (left) but fails with other languages (right).
Figure 2: Illustration of the proposed ACCENT method.
Figure 3: The radar charts show the EER (%) performance of systems across ten languages in the VC-CL3 and TTS-CL datasets. Systems without and with the proposed ACCENT method correspond to system 1 and 4 in Table \ref{['tab:accentaug']}, respectively. Each system is trained twice, with the better performance displayed in the radar charts.

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

TL;DR

Abstract

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)