Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing
Yang Xiao, Rohan Kumar Das
TL;DR
AnaST tackles audio deepfake source tracing under class-incremental learning by fixing the feature extractor after an initial training phase and replacing gradient-based training with a closed-form analytic classifier update guided by a feature autocorrelation matrix (FAuM). This analytic learning enables one-epoch adaptation per new attack while preserving past knowledge, avoiding the need for exemplars. On ASVspoof 2019 LA and WaveFake, AnaST achieves high accuracy and low forgetting, closely matching joint training performance and outperforming exemplar-free baselines, with competitive memory efficiency versus exemplar-based methods. The approach is practical for online, privacy-preserving, on-device deployment of deepfake source tracing.
Abstract
As deepfake speech becomes common and hard to detect, it is vital to trace its source. Recent work on audio deepfake source tracing (ST) aims to find the origins of synthetic or manipulated speech. However, ST models must adapt to learn new deepfake attacks while retaining knowledge of the previous ones. A major challenge is catastrophic forgetting, where models lose the ability to recognize previously learned attacks. Some continual learning methods help with deepfake detection, but multi-class tasks such as ST introduce additional challenges as the number of classes grows. To address this, we propose an analytic class incremental learning method called AnaST. When new attacks appear, the feature extractor remains fixed, and the classifier is updated with a closed-form analytical solution in one epoch. This approach ensures data privacy, optimizes memory usage, and is suitable for online training. The experiments carried out in this work show that our method outperforms the baselines.
