Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

Wei Chen; Yi Zhou

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

Wei Chen, Yi Zhou

TL;DR

The paper uncovers a counter-intuitive benefit of domain shift for class-incremental learning, showing that domain variation across tasks reduces forgetting by fostering more separable feature representations. It introduces DisCo, a plug-and-play, contrastive-learning-based method that constructs a prototype pool and enforces task- and class-level margins plus cross-task distillation to suppress interference while preserving old knowledge. Empirical results across CIFAR-100, Fashion-MNIST, and Tiny-ImageNet demonstrate substantial reductions in Forgetting $FM$ and gains in Average Accuracy $AA$ when DisCo is integrated with diverse CIL methods, including rehearsal-, regularization-, and prompt-based approaches. The approach is validated with multiple metrics (AA, FM, $PIV$, $PFTS$) and ablations, highlighting the primacy of task-level regularization and the complementary roles of class-level regularization and cross-task distillation. The findings suggest a practical pathway to enhance CIL by simulating domain-shift-induced discriminability at the feature level, with broad implications for real-world continual learning systems.

Abstract

In the realm of class-incremental learning (CIL), alleviating the catastrophic forgetting problem is a pivotal challenge. This paper discovers a counter-intuitive observation: by incorporating domain shift into CIL tasks, the forgetting rate is significantly reduced. Our comprehensive studies demonstrate that incorporating domain shift leads to a clearer separation in the feature distribution across tasks and helps reduce parameter interference during the learning process. Inspired by this observation, we propose a simple yet effective method named DisCo to deal with CIL tasks. DisCo introduces a lightweight prototype pool that utilizes contrastive learning to promote distinct feature distributions for the current task relative to previous ones, effectively mitigating interference across tasks. DisCo can be easily integrated into existing state-of-the-art class-incremental learning methods. Experimental results show that incorporating our method into various CIL methods achieves substantial performance improvements, validating the benefits of our approach in enhancing class-incremental learning by separating feature representation and reducing interference. These findings illustrate that DisCo can serve as a robust fashion for future research in class-incremental learning.

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

TL;DR

and gains in Average Accuracy

when DisCo is integrated with diverse CIL methods, including rehearsal-, regularization-, and prompt-based approaches. The approach is validated with multiple metrics (AA, FM,

) and ablations, highlighting the primacy of task-level regularization and the complementary roles of class-level regularization and cross-task distillation. The findings suggest a practical pathway to enhance CIL by simulating domain-shift-induced discriminability at the feature level, with broad implications for real-world continual learning systems.

Abstract

Paper Structure (42 sections, 14 equations, 7 figures, 14 tables)

This paper contains 42 sections, 14 equations, 7 figures, 14 tables.

Introduction
Related Works
Empirical Study
Problem Setup
Observation of Domain Shift on CIL
Empirical Study Setup.
Observation.
Quantitative Analysis
Method
Task-level and Class-level Regularization
Cross-task Contrastive Distillation
Generalize to Various Types of CIL Methods
Experiments
Experiment Setup
Evaluation on Three Benchmarks
...and 27 more sections

Figures (7)

Figure 1: The key finding of our work: Incorporating domain shift in class incremental learning contributes to a clear separation of feature space and a better resistance to forgetting.
Figure 2: Illustration of two scenarios construction on two datasets respectively. In Fig. \ref{['subfig: cifar100']}, we use AvatarNet avatarnet to synthesize images of new domains on CIFAR-100. We use the original images as training/testing set for CIL scenario and synthesized images(termed DomainCIFAR-100) as training/testing set for CILD scenario. In Fig. \ref{['subfig: domainnet']}, we split DomainNet domainnet to construct CIL and CILD scenario. Images of one domain make up the training/testing set for CIL scenario, and images of all domains make up the training/testing set for CILD scenario. In each dataset, the label space $\mathcal{C}_t$ of each task $t$ in CILD is consistent with that of CIL.
Figure 3: t-SNE visualization of features on CIFAR-100. The top row denotes features extracted by different continual methods under CIL scenario and the bottom row denotes features under CILD scenario. Data points from the same task are marked using the same color.
Figure 4: The overview framework of DisCo. DisCo includes Task&Class -level Regularization and Cross-task Contrastive Distillation. In Task&Class -level Regularization, samples in the current task are pulled toward the current task prototype while pushed away from previous task prototypes, leading to a discriminative feature distribution away from other tasks. Cross-task Contrastive Distillation helps align current model with previous one and preserve the features of old classes.
Figure 5: Ablation study on incremental task length. $B\{X\}$-$\{Y\}$ means there are $X$ classes in task 0 and the rest are evenly distributed in $Y$ tasks. The y-axis means the $AA_k$ at task $k$.
...and 2 more figures

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

TL;DR

Abstract

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)