Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

Penghui Wen; Kun Hu; Wenxi Yue; Sen Zhang; Wanlei Zhou; Zhiyong Wang

Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

Penghui Wen, Kun Hu, Wenxi Yue, Sen Zhang, Wanlei Zhou, Zhiyong Wang

TL;DR

This work proposes a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations and achieves the state-of-the-art performance.

Abstract

Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset: ASVspoof2019 LA Challenge.

Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 4 figures, 7 tables)

This paper contains 16 sections, 6 equations, 4 figures, 7 tables.

Introduction
Proposed Method
Raw Spectrogram and Power Spectrogram Encoding
Temporal-Spectral Fusion
Raw Spectrogram and Power Spectrogram Decoding
Spoofing Detection
Model Training
Experiments & Discussions
Dataset and Evaluation Metrics
Implementation Details
Performance Comparison
Ablation Study
Spectral Complementary Patterns
Impact of TSF Module
Impact of Reconstruction Decoders
...and 1 more sections

Figures (4)

Figure 1: Illustration of the performance for anti-spoofing on ASVspoof2019 LA Challenge, which is highly sensitive with the order of the spectral features used.
Figure 2: Illustration of the overall architecture for the proposed $\text{S}^2\text{pecNet}$.
Figure 3: Illustration of raw and power spectrograms, where the area in red bounding boxes represents high frequency regions.
Figure 4: Grad-CAM on spectrograms of two examples.

Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

TL;DR

Abstract

Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

Authors

TL;DR

Abstract

Table of Contents

Figures (4)