Table of Contents
Fetching ...

Rotation Augmented Distillation for Exemplar-Free Class Incremental Learning with Detailed Analysis

Xiuwei Chen, Xiaobin Chang

TL;DR

A simple CIL method, Rotation Augmented Distillation (RAD), is proposed, which achieves one of the top-tier performances under the Exemplar-Free setting, and benefits from the superior balance between plasticity and stability.

Abstract

Class incremental learning (CIL) aims to recognize both the old and new classes along the increment tasks. Deep neural networks in CIL suffer from catastrophic forgetting and some approaches rely on saving exemplars from previous tasks, known as the exemplar-based setting, to alleviate this problem. On the contrary, this paper focuses on the Exemplar-Free setting with no old class sample preserved. Balancing the plasticity and stability in deep feature learning with only supervision from new classes is more challenging. Most existing Exemplar-Free CIL methods report the overall performance only and lack further analysis. In this work, different methods are examined with complementary metrics in greater detail. Moreover, we propose a simple CIL method, Rotation Augmented Distillation (RAD), which achieves one of the top-tier performances under the Exemplar-Free setting. Detailed analysis shows our RAD benefits from the superior balance between plasticity and stability. Finally, more challenging exemplar-free settings with fewer initial classes are undertaken for further demonstrations and comparisons among the state-of-the-art methods.

Rotation Augmented Distillation for Exemplar-Free Class Incremental Learning with Detailed Analysis

TL;DR

A simple CIL method, Rotation Augmented Distillation (RAD), is proposed, which achieves one of the top-tier performances under the Exemplar-Free setting, and benefits from the superior balance between plasticity and stability.

Abstract

Class incremental learning (CIL) aims to recognize both the old and new classes along the increment tasks. Deep neural networks in CIL suffer from catastrophic forgetting and some approaches rely on saving exemplars from previous tasks, known as the exemplar-based setting, to alleviate this problem. On the contrary, this paper focuses on the Exemplar-Free setting with no old class sample preserved. Balancing the plasticity and stability in deep feature learning with only supervision from new classes is more challenging. Most existing Exemplar-Free CIL methods report the overall performance only and lack further analysis. In this work, different methods are examined with complementary metrics in greater detail. Moreover, we propose a simple CIL method, Rotation Augmented Distillation (RAD), which achieves one of the top-tier performances under the Exemplar-Free setting. Detailed analysis shows our RAD benefits from the superior balance between plasticity and stability. Finally, more challenging exemplar-free settings with fewer initial classes are undertaken for further demonstrations and comparisons among the state-of-the-art methods.
Paper Structure (10 sections, 6 equations, 4 figures, 4 tables)

This paper contains 10 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Incremental Accuracy of TinyImageNet and CIFAR100 with 10 incremental steps. The top-1 accuracy (%) after learning each task is shown. Existing SOTA methods achieve similar performance. The proposed Rotation Augmented Distillation (RAD) achieves the SOTA performance as well. The black dot in the upper right corner indicates the upper bound that the model trained on all the data. The black dotted line indicates the lower bound, a simple finetune method.
  • Figure 2: Illustrations of the proposed Rotation Augmented Distillation (RAD) method for exemplar-free class incremental learning at task $t$. $^{*}$ indicates the corresponding module is frozen at training.
  • Figure 3: Incremental Accuracy Curves of the SOTA methods. Each point represents the incremental classification accuracy (%) after model learning on each task.
  • Figure 4: Impacts of varying $\alpha$ and $\beta$ on overall results of ImgNet100 B50 10 steps.