On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning

Pengfei Fang; Yongchun Qin; Hui Xue

On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning

Pengfei Fang, Yongchun Qin, Hui Xue

TL;DR

This work addresses catastrophic forgetting in FSCIL by shifting from traditional similarity-based relational distillation to Displacement Knowledge Distillation (DKD), which preserves full structural relationships via pairwise displacement vectors in the original feature space. The authors propose the Dual Distillation Network (DDNet), combining IKD for base classes with DKD for novel classes, and an instance-aware sample selector to fuse predictions from both branches during inference. Empirical results on CIFAR-100, miniImageNet, and CUB-200 show state-of-the-art performance in terms of Knowledge Retention (KR) and robustness to outliers, with DKD providing notable gains in novel-class discrimination. The methodology generalizes beyond FSCIL to broader class-incremental learning settings, suggesting DKD as a versatile distillation paradigm for maintaining distributional consistency across sessions.

Abstract

Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios. To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution. Recognizing the limitations of generating discriminative feature representations in a few-shot context, our approach incorporates structural information between samples into knowledge distillation. This structural information serves as a remedy for the low quality of features. Diverging from traditional structured distillation methods that compute sample similarity, we introduce the Displacement Knowledge Distillation (DKD) method. DKD utilizes displacement rather than similarity between samples, incorporating both distance and angular information to significantly enhance the information density retained through knowledge distillation. Observing performance disparities in feature distribution between base and novel classes, we propose the Dual Distillation Network (DDNet). This network applies traditional knowledge distillation to base classes and DKD to novel classes, challenging the conventional integration of novel classes with base classes. Additionally, we implement an instance-aware sample selector during inference to dynamically adjust dual branch weights, thereby leveraging the complementary strengths of each approach. Extensive testing on three benchmarks demonstrates that DDNet achieves state-of-the-art results. Moreover, through rigorous experimentation and comparison, we establish the robustness and general applicability of our proposed DKD method.

On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning

TL;DR

Abstract

On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)