Table of Contents
Fetching ...

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing

Yang Xiao, Rohan Kumar Das

TL;DR

AnaST tackles audio deepfake source tracing under class-incremental learning by fixing the feature extractor after an initial training phase and replacing gradient-based training with a closed-form analytic classifier update guided by a feature autocorrelation matrix (FAuM). This analytic learning enables one-epoch adaptation per new attack while preserving past knowledge, avoiding the need for exemplars. On ASVspoof 2019 LA and WaveFake, AnaST achieves high accuracy and low forgetting, closely matching joint training performance and outperforming exemplar-free baselines, with competitive memory efficiency versus exemplar-based methods. The approach is practical for online, privacy-preserving, on-device deployment of deepfake source tracing.

Abstract

As deepfake speech becomes common and hard to detect, it is vital to trace its source. Recent work on audio deepfake source tracing (ST) aims to find the origins of synthetic or manipulated speech. However, ST models must adapt to learn new deepfake attacks while retaining knowledge of the previous ones. A major challenge is catastrophic forgetting, where models lose the ability to recognize previously learned attacks. Some continual learning methods help with deepfake detection, but multi-class tasks such as ST introduce additional challenges as the number of classes grows. To address this, we propose an analytic class incremental learning method called AnaST. When new attacks appear, the feature extractor remains fixed, and the classifier is updated with a closed-form analytical solution in one epoch. This approach ensures data privacy, optimizes memory usage, and is suitable for online training. The experiments carried out in this work show that our method outperforms the baselines.

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing

TL;DR

AnaST tackles audio deepfake source tracing under class-incremental learning by fixing the feature extractor after an initial training phase and replacing gradient-based training with a closed-form analytic classifier update guided by a feature autocorrelation matrix (FAuM). This analytic learning enables one-epoch adaptation per new attack while preserving past knowledge, avoiding the need for exemplars. On ASVspoof 2019 LA and WaveFake, AnaST achieves high accuracy and low forgetting, closely matching joint training performance and outperforming exemplar-free baselines, with competitive memory efficiency versus exemplar-based methods. The approach is practical for online, privacy-preserving, on-device deployment of deepfake source tracing.

Abstract

As deepfake speech becomes common and hard to detect, it is vital to trace its source. Recent work on audio deepfake source tracing (ST) aims to find the origins of synthetic or manipulated speech. However, ST models must adapt to learn new deepfake attacks while retaining knowledge of the previous ones. A major challenge is catastrophic forgetting, where models lose the ability to recognize previously learned attacks. Some continual learning methods help with deepfake detection, but multi-class tasks such as ST introduce additional challenges as the number of classes grows. To address this, we propose an analytic class incremental learning method called AnaST. When new attacks appear, the feature extractor remains fixed, and the classifier is updated with a closed-form analytical solution in one epoch. This approach ensures data privacy, optimizes memory usage, and is suitable for online training. The experiments carried out in this work show that our method outperforms the baselines.

Paper Structure

This paper contains 14 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An overview of the proposed AnaST method for Task $\tau_t$. We proceed to the class incremental learning stage, where the model adapts by analytic learning for one epoch per new dataset phase, assisted by a correlation matrix (Eq. \ref{['eq_R_update']}) that encodes past knowledge. This process enables the model to learn new tasks while preserving previously acquired information.
  • Figure 2: Task-wise performance in comparison ACC (%).