Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

Yikang Wang; Xingming Wang; Hiromitsu Nishizaki; Ming Li

Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

TL;DR

The paper addresses the vulnerability of anti-spoofing countermeasures to noisy and reverberant conditions. It introduces TL-SEJ, a framework that couples a Dual-input Unet-based speech enhancement front-end with a Conformer back-end pre-trained on ASR data, and optimizes them jointly using a combined loss to improve robustness. Empirical results show notable improvements in noisy conditions across multiple SNR levels and reverberation scenarios, with additional gains in cross-dataset generalization, demonstrating practical benefits for real-world deployment. While performance under severe babble noise remains challenging, the approach significantly advances noise/reverberation resilience in synthesized-speech detection and offers a pathway for robust, transfer-learning–driven CM systems.

Abstract

Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based speech enhancement front-end joint optimization (TL-SEJ) method, investigating its effectiveness in improving robustness against noise and reverberation. We evaluated the proposed method's performance through a series of comparative and ablation experiments. The experimental results show that, across different signal-to-noise ratio test conditions, the proposed TL-SEJ method improves recognition accuracy by 2.7% to 15.8% compared to the baseline. Compared to conventional data augmentation methods, our system achieves an accuracy improvement ranging from 0.7% to 5.8% in various noisy conditions and from 1.7% to 2.8% under different RT60 reverberation scenarios. These experiments demonstrate that the proposed method effectively enhances system robustness in noisy and reverberant conditions.

Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

TL;DR

Abstract

Paper Structure (29 sections, 19 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 19 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Data Augmentation
Pre-training Model and Transfer Learning
Speehc Enhancement
Methodology
Data Augmentation in the Preprocessing Stage
Unet-based Speech Enhancement Before Anti-Spoofing Network
Transfer Learning in the Anti-Spoofing Network Phase
Conformer Model
Transfer Learning with ASR Pre-trained Conformer Model
Joint Training of Speech Enhancement Front-End and Anti-Spoofing Back-End Models
Audio Anti-Spoofing Module
Joint Training with Transfer Learning-Based Back-End
Dataset
...and 14 more sections

Figures (4)

Figure 1: The illustration of typtical pipeline solution for synthesized speech detection systems. (a) The clean data-based spoofing detection systems are mainly implemented on these parts. (b) Noise data-based spoofing detection systems are mainly implemented on these parts
Figure 2:
Figure 3: unet
Figure 4: ablation

Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

TL;DR

Abstract

Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)