Table of Contents
Fetching ...

Joint Input and Output Coordination for Class-Incremental Learning

Shuai Wang, Yibing Zhan, Yong Luo, Han Hu, Wei Yu, Yonggang Wen, Dacheng Tao

TL;DR

A joint input and output coordination (JIOC) mechanism is proposed that assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation to reduce the mutual interference between the outputs of old and new tasks.

Abstract

Incremental learning is nontrivial due to severe catastrophic forgetting. Although storing a small amount of data on old tasks during incremental learning is a feasible solution, current strategies still do not 1) adequately address the class bias problem, and 2) alleviate the mutual interference between new and old tasks, and 3) consider the problem of class bias within tasks. This motivates us to propose a joint input and output coordination (JIOC) mechanism to address these issues. This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks. The proposed mechanism is general and flexible, and can be incorporated into different incremental learning approaches that use memory storage. Extensive experiments show that our mechanism can significantly improve their performance.

Joint Input and Output Coordination for Class-Incremental Learning

TL;DR

A joint input and output coordination (JIOC) mechanism is proposed that assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation to reduce the mutual interference between the outputs of old and new tasks.

Abstract

Incremental learning is nontrivial due to severe catastrophic forgetting. Although storing a small amount of data on old tasks during incremental learning is a feasible solution, current strategies still do not 1) adequately address the class bias problem, and 2) alleviate the mutual interference between new and old tasks, and 3) consider the problem of class bias within tasks. This motivates us to propose a joint input and output coordination (JIOC) mechanism to address these issues. This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks. The proposed mechanism is general and flexible, and can be incorporated into different incremental learning approaches that use memory storage. Extensive experiments show that our mechanism can significantly improve their performance.
Paper Structure (17 sections, 18 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 18 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of the class imbalance and mutual interference issues. The difference in the number of input data for each class between tasks and within tasks makes the weights of fully connected layers greatly biased (neuron size). The output scores of data from old tasks ($1, \cdots, t-1$) on the classification heads of new task $t$ should approximate zero, but may be much larger than zero (green solid line) after training the new task model. The output scores of data from the new task on the classification heads of old tasks may be inconsistent before (blue dotted line) and after (blue solid line) updating the old task models.
  • Figure 2: Overall structure of the proposed method. Firstly, the absolute gradient of the output scores is computed, based on the $\hat{p}^{\tau}_{i,j,k=i}$ and the $y^{\tau}_{i,j,k=i}$, to induce a weight for each sample, where the weights are adaptively updated during the training. Then $L_{OC,1\rightarrow t-1}$ is employed to maintain the outputs for each old task. $L_{OC,1\rightarrow t-1}$ is also utilized to make the outputs of new task data on old task classification heads after updating the old task models agree with those before the update. Finally, to suppress the outputs of old task data on new task classification heads, their output scores $\hat{p}^{t}_{i,j,k}$ are directly optimized to approach zeros (The solid blue line and solid green line represents the output distribution of the new task and old task data, respectively, on the new task classification head; The dashed blue line and dashed green line represents the output distribution of the new task and old task data, respectively, on the old task classification head).
  • Figure 3: A comparison of the SSIL approach (left) and the proposed output coordination (right). In SSIL, only the outputs of old task data on old classification heads are kept consistent before and after updating. We improve it by further enforcing the outputs of new task data on old classification heads to be consistent, and suppress the outputs of old task data on new classification heads.
  • Figure 4: A Compare the experimental results of SSIL with those of SSIL_OC (SSIL_OC is formed by integrating the proposed OC strategy into the SSIL algorithm. $\ast$ means our impementation).