Table of Contents
Fetching ...

Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Kotaro Nagata, Hiromu Ono, Kazuhiro Hotta

TL;DR

The paper tackles catastrophic forgetting in online class-incremental learning by introducing a self-distillation loss that leverages highly generalized features from shallow layers to regularize deeper representations. It also introduces a memory-update strategy that prioritizes misclassified samples to diversify learning under a fixed memory budget. Building on Supervised Contrastive Replay (SCR), the method adds a KL-divergence-based distillation between shallow and deep feature similarities and uses a Nearest Class Mean (NCM) classifier for inference, with prototypes updated from the memory buffer. Experiments on Split CIFAR-10, CIFAR-100, and MiniImageNet show improved end-task accuracy, especially at small buffers, indicating enhanced generalization and robustness for online continual learning.

Abstract

In continual learning, there is a serious problem of catastrophic forgetting, in which previous knowledge is forgotten when a model learns new tasks. Various methods have been proposed to solve this problem. Replay methods which replay data from previous tasks in later training, have shown good accuracy. However, replay methods have a generalizability problem from a limited memory buffer. In this paper, we tried to solve this problem by acquiring transferable knowledge through self-distillation using highly generalizable output in shallow layer as a teacher. Furthermore, when we deal with a large number of classes or challenging data, there is a risk of learning not converging and not experiencing overfitting. Therefore, we attempted to achieve more efficient and thorough learning by prioritizing the storage of easily misclassified samples through a new method of memory update. We confirmed that our proposed method outperformed conventional methods by experiments on CIFAR10, CIFAR100, and MiniimageNet datasets.

Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

TL;DR

The paper tackles catastrophic forgetting in online class-incremental learning by introducing a self-distillation loss that leverages highly generalized features from shallow layers to regularize deeper representations. It also introduces a memory-update strategy that prioritizes misclassified samples to diversify learning under a fixed memory budget. Building on Supervised Contrastive Replay (SCR), the method adds a KL-divergence-based distillation between shallow and deep feature similarities and uses a Nearest Class Mean (NCM) classifier for inference, with prototypes updated from the memory buffer. Experiments on Split CIFAR-10, CIFAR-100, and MiniImageNet show improved end-task accuracy, especially at small buffers, indicating enhanced generalization and robustness for online continual learning.

Abstract

In continual learning, there is a serious problem of catastrophic forgetting, in which previous knowledge is forgotten when a model learns new tasks. Various methods have been proposed to solve this problem. Replay methods which replay data from previous tasks in later training, have shown good accuracy. However, replay methods have a generalizability problem from a limited memory buffer. In this paper, we tried to solve this problem by acquiring transferable knowledge through self-distillation using highly generalizable output in shallow layer as a teacher. Furthermore, when we deal with a large number of classes or challenging data, there is a risk of learning not converging and not experiencing overfitting. Therefore, we attempted to achieve more efficient and thorough learning by prioritizing the storage of easily misclassified samples through a new method of memory update. We confirmed that our proposed method outperformed conventional methods by experiments on CIFAR10, CIFAR100, and MiniimageNet datasets.
Paper Structure (16 sections, 6 equations, 2 figures, 2 tables)

This paper contains 16 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The overview of the proposed method. Our approach is based on Supervised Contrastive Replay (SCR) of Replay method. Distilling knowledge from shallow layers by aligning the similarity maps of normalized features.
  • Figure 2: The new memory update method. Save the bottom $N$ images with low probabilities of the correct class (in this experiment, $N$=5) and prioritize storing them.