Advancing Continual Learning for Robust Deepfake Audio Classification

Feiyi Dong; Qingchen Tang; Yichen Bai; Zihan Wang

Advancing Continual Learning for Robust Deepfake Audio Classification

Feiyi Dong, Qingchen Tang, Yichen Bai, Zihan Wang

TL;DR

This work tackles the problem of robust deepfake audio detection under unseen spoofing attacks by introducing CADE, a continual learning framework that preserves past knowledge while adapting to new threats. CADE combines a fixed-memory replay strategy with three loss components: Knowledge Distillation, Attention Distillation via Grad-CAM, and an Embedding-based Positive Sample Alignment across multiple layers, formalized as $CADE = Replay + L_c + \alpha L_{kd} + \beta L_{ad} + \gamma L_{psa}$. Empirical evaluation on the ASVspoof2019 dataset demonstrates that CADE consistently outperforms traditional continual learning baselines across different spoofing types and backbones (RawNet2, LFCC-LCNN), achieving lower $EER$ even with limited memory. The findings suggest that CADE offers a practical, memory-efficient solution for adaptive, long-term audio anti-spoofing systems with real-world applicability.

Abstract

The emergence of new spoofing attacks poses an increasing challenge to audio security. Current detection methods often falter when faced with unseen spoofing attacks. Traditional strategies, such as retraining with new data, are not always feasible due to extensive storage. This paper introduces a novel continual learning method Continual Audio Defense Enhancer (CADE). First, by utilizing a fixed memory size to store randomly selected samples from previous datasets, our approach conserves resources and adheres to privacy constraints. Additionally, we also apply two distillation losses in CADE. By distillation in classifiers, CADE ensures that the student model closely resembles that of the teacher model. This resemblance helps the model retain old information while facing unseen data. We further refine our model's performance with a novel embedding similarity loss that extends across multiple depth layers, facilitating superior positive sample alignment. Experiments conducted on the ASVspoof2019 dataset show that our proposed method outperforms the baseline methods.

Advancing Continual Learning for Robust Deepfake Audio Classification

TL;DR

. Empirical evaluation on the ASVspoof2019 dataset demonstrates that CADE consistently outperforms traditional continual learning baselines across different spoofing types and backbones (RawNet2, LFCC-LCNN), achieving lower

even with limited memory. The findings suggest that CADE offers a practical, memory-efficient solution for adaptive, long-term audio anti-spoofing systems with real-world applicability.

Abstract

Paper Structure (17 sections, 6 equations, 1 figure, 4 tables)

This paper contains 17 sections, 6 equations, 1 figure, 4 tables.

Introduction
Proposed Methods
Proposed Continual Audio Defense Enhancer
Integration of Replay-Based Methods with Fixed Memory Sampling
Knowledge Distillation Loss
Attention Distillation Loss
Improved Positive Sample Alignment
Experiment setting
Dataset
Models
Task Setting
Metric
Result and Analysis
Experiments on significantly different spoofing types
Experiments on the LA series spoofing types
...and 2 more sections

Figures (1)

Figure 1: Method diagram of proposed CADE approach. Specifically, this diagram illustrates the training process at time $t$ for our novel method. At this time step, new data from task $t$ and a subset of data from task $t-1$ (selected using a replay strategy) are combined to form the input. This input is concurrently fed into the model from time $t-1$ (teacher model) for inference and the model at time $t$ (student model) for training. $L_c$ refers to the classification loss. The student model's training is further guided by three loss functions: $L_{\text{ad}}$ (Attention Distillation Loss derived from Grad-CAM), $L_{\text{psa}}$ (Positive Sample Alignment Loss), and $L_{\text{kd}}$ (Knowledge Distillation Loss), which collectively help the student model to retain the teacher model's knowledge.

Advancing Continual Learning for Robust Deepfake Audio Classification

TL;DR

Abstract

Advancing Continual Learning for Robust Deepfake Audio Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (1)