Table of Contents
Fetching ...

Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness

Jaeill Kim, Wonseok Lee, Moonjung Eo, Wonjong Rhee

TL;DR

The paper tackles forward compatibility in class incremental learning by increasing the effective rank of representations during the base session through a differentiable regularization term. It formalizes the notion of effective rank erank and connects it to Shannon entropy, showing that maximizing erank increases feature richness and downstream usefulness for novel tasks. Empirically, RFR improves forward compatibility across eleven established CIL methods on CIFAR-100 and ImageNet-100, while also mitigating forgetting, with ablation confirming robustness to hyperparameters and dataset scale. The approach offers practical benefits with minimal overhead and broad applicability to CNNs and vision transformers, making it a versatile addition to existing CIL strategies.

Abstract

Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible methods to enhance performance on novel tasks and complement existing backward compatible methods. In this study, we introduce an effective-Rank based Feature Richness enhancement (RFR) method, designed for improving forward compatibility. Specifically, this method increases the effective rank of representations during the base session, thereby facilitating the incorporation of more informative features pertinent to unseen novel tasks. Consequently, RFR achieves dual objectives in backward and forward compatibility: minimizing feature extractor modifications and enhancing novel task performance, respectively. To validate the efficacy of our approach, we establish a theoretical connection between effective rank and the Shannon entropy of representations. Subsequently, we conduct comprehensive experiments by integrating RFR into eleven well-known CIL methods. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting. Furthermore, our method notably improves the average incremental accuracy across all eleven cases examined.

Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness

TL;DR

The paper tackles forward compatibility in class incremental learning by increasing the effective rank of representations during the base session through a differentiable regularization term. It formalizes the notion of effective rank erank and connects it to Shannon entropy, showing that maximizing erank increases feature richness and downstream usefulness for novel tasks. Empirically, RFR improves forward compatibility across eleven established CIL methods on CIFAR-100 and ImageNet-100, while also mitigating forgetting, with ablation confirming robustness to hyperparameters and dataset scale. The approach offers practical benefits with minimal overhead and broad applicability to CNNs and vision transformers, making it a versatile addition to existing CIL strategies.

Abstract

Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible methods to enhance performance on novel tasks and complement existing backward compatible methods. In this study, we introduce an effective-Rank based Feature Richness enhancement (RFR) method, designed for improving forward compatibility. Specifically, this method increases the effective rank of representations during the base session, thereby facilitating the incorporation of more informative features pertinent to unseen novel tasks. Consequently, RFR achieves dual objectives in backward and forward compatibility: minimizing feature extractor modifications and enhancing novel task performance, respectively. To validate the efficacy of our approach, we establish a theoretical connection between effective rank and the Shannon entropy of representations. Subsequently, we conduct comprehensive experiments by integrating RFR into eleven well-known CIL methods. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting. Furthermore, our method notably improves the average incremental accuracy across all eleven cases examined.
Paper Structure (30 sections, 1 theorem, 7 equations, 7 figures, 10 tables)

This paper contains 30 sections, 1 theorem, 7 equations, 7 figures, 10 tables.

Key Result

Theorem 1

For representation $\bm{h} \in \mathbb{R}^{d}$ that follows a multivariate Gaussian distribution, the entropy of representation is maximized if the effective rank of the representation is maximized.

Figures (7)

  • Figure 1: Impact of increasing representation rank during the base session. We conduct an analysis of UCIR models with and without the integration of our method. ResNet-18 model is trained for the CIFAR-100 dataset, utilizing 50 base classes and a split size of 10 classes for each novel session. (a) Effective rank of the feature extractor. (b) Novel task performance in each novel session. (c) The degree of catastrophic forgetting that occurs for the base task.
  • Figure 2: Rank vs. feature richness. ResNet-18 was trained using ImageNet-100 dataset. (a) Supervised learning --- as we include more classes in the training that starts from scratch, both $\mathsf{trank}$ and $\mathsf{erank}$ increase. Algebraic rank remains at the maximum value. (b) Unsupervised learning with SimCLR loss --- as the unsupervised representation learning proceeds, both $\mathsf{trank}$ and $\mathsf{erank}$ increase. Algebraic rank remains at the maximum value.
  • Figure 3: Improvements in forward compatibility -- overall accuracy at each session is shown for UCIR. Two ResNet-18 models are trained with and without RFR for ImageNet-100 dataset, utilizing 50 base classes under different split sizes (a) 10, (b) 5, and (c) 2 for each novel session. The feature extractors trained by the 50 classes of the base task remain frozen during novel sessions.
  • Figure 7: Weight change from the base session for UCIR. Two ResNet-18 models are trained with and without RFR for ImageNet-100 dataset, utilizing 50 base classes under different split sizes (a) 10, (b) 5, and (c) 2 for each novel session.
  • Figure 11: Catastrophic forgetting in the base task for UCIR. Two ResNet-18 models are trained with and without RFR for ImageNet-100 dataset, utilizing 50 base classes under different split sizes (a) 10, (b) 5, and (c) 2 for each novel session.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof