Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning
Wei Chen, Yi Zhou
TL;DR
The paper uncovers a counter-intuitive benefit of domain shift for class-incremental learning, showing that domain variation across tasks reduces forgetting by fostering more separable feature representations. It introduces DisCo, a plug-and-play, contrastive-learning-based method that constructs a prototype pool and enforces task- and class-level margins plus cross-task distillation to suppress interference while preserving old knowledge. Empirical results across CIFAR-100, Fashion-MNIST, and Tiny-ImageNet demonstrate substantial reductions in Forgetting $FM$ and gains in Average Accuracy $AA$ when DisCo is integrated with diverse CIL methods, including rehearsal-, regularization-, and prompt-based approaches. The approach is validated with multiple metrics (AA, FM, $PIV$, $PFTS$) and ablations, highlighting the primacy of task-level regularization and the complementary roles of class-level regularization and cross-task distillation. The findings suggest a practical pathway to enhance CIL by simulating domain-shift-induced discriminability at the feature level, with broad implications for real-world continual learning systems.
Abstract
In the realm of class-incremental learning (CIL), alleviating the catastrophic forgetting problem is a pivotal challenge. This paper discovers a counter-intuitive observation: by incorporating domain shift into CIL tasks, the forgetting rate is significantly reduced. Our comprehensive studies demonstrate that incorporating domain shift leads to a clearer separation in the feature distribution across tasks and helps reduce parameter interference during the learning process. Inspired by this observation, we propose a simple yet effective method named DisCo to deal with CIL tasks. DisCo introduces a lightweight prototype pool that utilizes contrastive learning to promote distinct feature distributions for the current task relative to previous ones, effectively mitigating interference across tasks. DisCo can be easily integrated into existing state-of-the-art class-incremental learning methods. Experimental results show that incorporating our method into various CIL methods achieves substantial performance improvements, validating the benefits of our approach in enhancing class-incremental learning by separating feature representation and reducing interference. These findings illustrate that DisCo can serve as a robust fashion for future research in class-incremental learning.
