DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani
TL;DR
DeCoR tackles catastrophic forgetting in lifelong audio representation learning by distilling prior knowledge into the current model through predicting delayed codebook indices, avoiding storage of past data or teacher models. It constructs a delayed codebook at task boundaries and trains an index predictor to regularize the new model, achieving continual learning with minimal storage and computation. Evaluations on TAU Urban Acoustic Scenes show consistent improvements in final seen accuracy $A_T$ and reduced forgetting $F_T$ for both supervised and self-supervised setups, outperforming replay and standard distillation baselines and synergizing with SimCLR. The approach offers a lightweight, scalable solution for continual audio representation learning with potential applicability to other audio tasks and online settings.
Abstract
Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio representation learning called DeCoR. Unlike other methods that store previous data, features, or models, DeCoR indirectly distills knowledge from an earlier model to the latest by predicting quantization indices from a delayed codebook. We demonstrate that DeCoR improves acoustic scene classification accuracy and integrates well with continual self-supervised representation learning. Our approach introduces minimal storage and computation overhead, making it a lightweight and efficient solution for continual learning.
