Table of Contents
Fetching ...

Defying Imbalanced Forgetting in Class Incremental Learning

Shixiong Xu, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan, Shiming Xiang

TL;DR

CLass-Aware Disentanglement (CLAD) is proposed as a means to predict the old classes that are more likely to be forgotten and enhance their accuracy, and can be seamlessly integrated into existing CIL methods.

Abstract

We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the reliance on average incremental accuracy as the measurement for CIL, which assumes that the accuracy of classes within the same task is similar. However, this assumption is invalid in the face of catastrophic forgetting. Further empirical studies indicate that this imbalanced forgetting is caused by conflicts in representation between semantically similar old and new classes. These conflicts are rooted in the data imbalance present in replay-based CIL methods. Building on these insights, we propose CLass-Aware Disentanglement (CLAD) to predict the old classes that are more likely to be forgotten and enhance their accuracy. Importantly, CLAD can be seamlessly integrated into existing CIL methods. Extensive experiments demonstrate that CLAD consistently improves current replay-based methods, resulting in performance gains of up to 2.56%.

Defying Imbalanced Forgetting in Class Incremental Learning

TL;DR

CLass-Aware Disentanglement (CLAD) is proposed as a means to predict the old classes that are more likely to be forgotten and enhance their accuracy, and can be seamlessly integrated into existing CIL methods.

Abstract

We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the reliance on average incremental accuracy as the measurement for CIL, which assumes that the accuracy of classes within the same task is similar. However, this assumption is invalid in the face of catastrophic forgetting. Further empirical studies indicate that this imbalanced forgetting is caused by conflicts in representation between semantically similar old and new classes. These conflicts are rooted in the data imbalance present in replay-based CIL methods. Building on these insights, we propose CLass-Aware Disentanglement (CLAD) to predict the old classes that are more likely to be forgotten and enhance their accuracy. Importantly, CLAD can be seamlessly integrated into existing CIL methods. Extensive experiments demonstrate that CLAD consistently improves current replay-based methods, resulting in performance gains of up to 2.56%.
Paper Structure (12 sections, 8 equations, 4 figures, 3 tables)

This paper contains 12 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Demonstration of imbalanced forgetting. Visualization of the accuracy of each class in the first task obtained by joint training and LUCIR LUCIR. The class indexes are sorted according to the result from LUCIR for better visualization.More illustrations of the above phenomenon with other methods can be found in the supple- mentary material.
  • Figure 2: An overview of (a) existing replay-based methods and (b) our proposed CLAD. In existing replay-based methods, different old classes 1,2, and 3 have different accuracy because of the different similarities with the new class. The limited exemplars are not sufficient to preserve the boundary of the test set (low accuracy of classes 2 and 3 in (a)). Our proposed CLAD consists of two parts: Forgetting Prediction (FP) and Representation Disentanglement (RD). FP aims to find the classes that might be forgotten during the learning of new classes (classes 2 and 3). Based on the similarity information from FP, RD encourages the representation of new classes to stay away from similar old ones.
  • Figure 3: Illustration of the relative class forgetting and average similarity with latter classes for each old one. There is a positive correlation between the maximum similarity forgetting in different settings and methods. The first row gives the experiments that begin with 50 classes and 10 classes for each latter task, and the number of classes in the latter tasks in the second and third rows is 5 and 2, respectively.
  • Figure 4: Ablation studies on the effectiveness of conflict prediction (a), the proportion of conflict classes (a), the impact of coefficient of CLAD loss (b), and components in conflict mitigation (c). The average incremental accuracy is reported for each experiment, which is averaged on three runs with different seeds.