Class-Incremental Learning for Multi-Label Audio Classification

Manjunath Mulimani; Annamaria Mesaros

Class-Incremental Learning for Multi-Label Audio Classification

Manjunath Mulimani, Annamaria Mesaros

TL;DR

This work tackles class-incremental learning for multi-label audio where overlapping sounds occur across sequential tasks. It proposes an independent-learning framework (IndL) augmented with two distillation losses: a cosine-similarity feature distillation ($L^{FD}$) and a KL-divergence output distillation ($L^{OD}$), combined in an adaptive loss to preserve old knowledge while adding new classes. The proposed IODFD method (IndL with both $L^{OD}$ and $L^{FD}$) outperforms baselines on a 50-class Audioset subset, achieving an average F1 of $40.9\%$ with minimal forgetting ($Fr=0.7$ pp) across five phases, and remaining competitive with non-incremental training. These results demonstrate a effective balance between plasticity for new sounds and stability for old ones in multi-label audio CIL, with implications for scalable audio tagging and potential extensions to exemplar-based strategies and event detection.

Abstract

In this paper, we propose a method for class-incremental learning of potentially overlapping sounds for solving a sequence of multi-label audio classification tasks. We design an incremental learner that learns new classes independently of the old classes. To preserve knowledge about the old classes, we propose a cosine similarity-based distillation loss that minimizes discrepancy in the feature representations of subsequent learners, and use it along with a Kullback-Leibler divergence-based distillation loss that minimizes discrepancy in their respective outputs. Experiments are performed on a dataset with 50 sound classes, with an initial classification task containing 30 base classes and 4 incremental phases of 5 classes each. After each phase, the system is tested for multi-label classification with the entire set of classes learned so far. The proposed method obtains an average F1-score of 40.9% over the five phases, ranging from 45.2% in phase 0 on 30 classes, to 36.3% in phase 4 on 50 classes. Average performance degradation over incremental phases is only 0.7 percentage points from the initial F1-score of 45.2%.

Class-Incremental Learning for Multi-Label Audio Classification

TL;DR

) and a KL-divergence output distillation (

), combined in an adaptive loss to preserve old knowledge while adding new classes. The proposed IODFD method (IndL with both

and

) outperforms baselines on a 50-class Audioset subset, achieving an average F1 of

with minimal forgetting (

pp) across five phases, and remaining competitive with non-incremental training. These results demonstrate a effective balance between plasticity for new sounds and stability for old ones in multi-label audio CIL, with implications for scalable audio tagging and potential extensions to exemplar-based strategies and event detection.

Abstract

Paper Structure (10 sections, 4 equations, 4 figures, 2 tables)

This paper contains 10 sections, 4 equations, 4 figures, 2 tables.

Introduction
Class-Incremental Learning
Tasks setup and notations
Baselines
Class-incremental learning method
Evaluation and Results
Dataset and training setup
Implementation details and evaluation metrics
Results
Conclusion

Figures (4)

Figure 1: An overview of the proposed CIL approach in an incremental time phase $i$. Three losses: $\mathcal{L}^{BCE}$, $\mathcal{L}^{OD}$ and $\mathcal{L}^{FD}$ are used to train current learner $\mathcal{P}^{{i}}$. $\mathcal{L}^{OD}$ and $\mathcal{L}^{FD}$ minimize the output and feature discrepancy between $\mathcal{P}^{{i}}$ and frozen $\hat{\mathcal{P}}^{i-1}$ learners to preserve the knowledge of previous classes. $\mathcal{L}^{BCE}$ is computed independently over logits $\mathbf{o}_{new}$ to learn new classes.
Figure 2: Number of labels per file in the evaluation set, as the incremental learning of new classes progresses up to 50 classes.
Figure 3: Comparison of F1-score (a), Fr (b) and mAP (c) of approaches. Approaches are the non-incremental audio tagging (AT), fine-tuning (FT), feature extraction (FE), IndL (without distillation losses), $\mathcal{L}^{OD}$ (without IndL), IOD mulimani2023incremental, proposed IFD and IODFD.
Figure 4: L2-norm of the classification weight vectors for all the classes at the end of incremental learning (phase 4).

Class-Incremental Learning for Multi-Label Audio Classification

TL;DR

Abstract

Class-Incremental Learning for Multi-Label Audio Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)