Table of Contents
Fetching ...

Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner

Qiang Nie, Weifu Fu, Yuhuan Lin, Jialin Li, Yifeng Zhou, Yong Liu, Lei Zhu, Chengjie Wang

TL;DR

This work defines a practical instance-incremental learning (IIL) setting where only new observations are available and old data cannot be accessed, addressing both catastrophic forgetting and concept drift. It proposes a decision boundary-aware distillation framework (DBD) coupled with knowledge consolidation via KC-EMA to transfer and stabilize new knowledge while preserving prior boundaries. The method, tested on CIFAR-100 and ImageNet-100 benchmarks, achieves state-of-the-art performance in terms of performance promotion with low forgetting and is validated through extensive ablations on fused labels, dusting the input space, and EMA scheduling. The approach offers a cost-effective, privacy-friendly pathway for continual model improvement in real deployments, with potential extensions to few-shot IIL and improved consolidation mechanisms.

Abstract

Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with the potential unavailability of previous data is a more essential demand. Therefore, we first define a new and more practical IIL setting as promoting the model's performance besides resisting CF with only new observations. Two issues have to be tackled in the new IIL setting: 1) the notorious catastrophic forgetting because of no access to old data, and 2) broadening the existing decision boundary to new observations because of concept drift. To tackle these problems, our key insight is to moderately broaden the decision boundary to fail cases while retain old boundary. Hence, we propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge. We also establish the benchmarks on existing datasets Cifar-100 and ImageNet. Notably, extensive experiments demonstrate that the teacher model can be a better incremental learner than the student model, which overturns previous knowledge distillation-based methods treating student as the main role.

Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner

TL;DR

This work defines a practical instance-incremental learning (IIL) setting where only new observations are available and old data cannot be accessed, addressing both catastrophic forgetting and concept drift. It proposes a decision boundary-aware distillation framework (DBD) coupled with knowledge consolidation via KC-EMA to transfer and stabilize new knowledge while preserving prior boundaries. The method, tested on CIFAR-100 and ImageNet-100 benchmarks, achieves state-of-the-art performance in terms of performance promotion with low forgetting and is validated through extensive ablations on fused labels, dusting the input space, and EMA scheduling. The approach offers a cost-effective, privacy-friendly pathway for continual model improvement in real deployments, with potential extensions to few-shot IIL and improved consolidation mechanisms.

Abstract

Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with the potential unavailability of previous data is a more essential demand. Therefore, we first define a new and more practical IIL setting as promoting the model's performance besides resisting CF with only new observations. Two issues have to be tackled in the new IIL setting: 1) the notorious catastrophic forgetting because of no access to old data, and 2) broadening the existing decision boundary to new observations because of concept drift. To tackle these problems, our key insight is to moderately broaden the decision boundary to fail cases while retain old boundary. Hence, we propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge. We also establish the benchmarks on existing datasets Cifar-100 and ImageNet. Notably, extensive experiments demonstrate that the teacher model can be a better incremental learner than the student model, which overturns previous knowledge distillation-based methods treating student as the main role.
Paper Structure (23 sections, 13 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 13 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the new IIL setting. At the IIL learning phase $t>0$, only the new data $D_n(t)$ that is much smaller than the base data is available. Model should be promoted by only leveraging the new data each time and seeks a performance close to the full data model trained on all accumulated data. Fine-tuning with early stopping fails to enhance the model in the new IIL setting.
  • Figure 2: Decision boundaries (DB): (a) DB learned from old data and new data, respectively. With respect to the old DB, new data can be categorized into inner samples and outer samples. (b) ideal DB by jointly training on the old data and new data. (c) fine-tuning the model on the new data with one-hot labels suffers to CF. (d) learning with distillation on prototype exemplars causes overfitting to these exemplars and DB collapsing. (e) the DB achieved using our decision boundary-aware distillation (DBD).
  • Figure 3: Comparison between (a) previous distillation-based method which inferences with student model (S) and (b) the proposed decision boundary-aware distillation (DBD) with knowledge consolidation (KC). We use teacher model (T) for inference.
  • Figure 4: Detailed performance promotion ($PP$) and forgetting rate ($\mathcal{F}$) at each IIL phase. Best to view in color with scaling.
  • Figure 5: Effect of three components in DBD: fused label (FL), dusted input space (DIS), and knowledge consolidation (KC). It can be seen that all components contributes to the continual knowledge accumulation with new data.
  • ...and 6 more figures