UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection
Yang Xiao, Rohan Kumar Das
TL;DR
This paper tackles the problem of evolving sound event detection (SED) systems that must incrementally learn new sound classes without retraining from scratch. It proposes UCIL, an unsupervised class incremental learning framework that isolates learning of new classes (independent learning), preserves past knowledge through two distillation losses $L_{dis}^P$ and $L_{dis}^F$, and leverages unlabeled data via sample selection and a balanced rehearsal memory, with a learning objective that includes $L_{cls}$, $L_{dis}^P$, $L_{dis}^F$, and $L_{dis}^U$. The method is evaluated on the DCASE 2023 Task 4A dataset under two-task and four-task settings, showing competitive PSDS1/PSDS2 scores and clear gains over baseline continual learning approaches, particularly for reducing class confusion (PSDS2) as the number of tasks increases. These results demonstrate that UCIL can maintain detection performance while expanding its repertoire of sound events, offering a practical path toward real-world, dynamic SED systems that learn from both labeled and unlabeled data.
Abstract
This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios. CIL's success in domains like computer vision inspired our SED-tailored method, addressing the unique challenges of diverse and complex audio environments. Our approach employs an independent unsupervised learning framework with a distillation loss function to integrate new sound classes while preserving the SED model consistency across incremental tasks. We further enhance this framework with a sample selection strategy for unlabeled data and a balanced exemplar update mechanism, ensuring varied and illustrative sound representations. Evaluating various continual learning methods on the DCASE 2023 Task 4 dataset, we find that our research offers insights into each method's applicability for real-world SED systems that can have newly added sound classes. The findings also delineate future directions of CIL in dynamic audio settings.
