Table of Contents
Fetching ...

Preview-based Category Contrastive Learning for Knowledge Distillation

Muhe Ding, Jianlong Wu, Xue Dong, Xiaojie Li, Pengda Qin, Tian Gan, Liqiang Nie

TL;DR

This work proposes a novel preview-based category contrastive learning method for knowledge distillation (PCKD), which first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation.

Abstract

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly investigate the consistency between instance-level feature representation or prediction, which neglects the category-level information and the difficulty of each sample, leading to undesirable performance. To address these issues, we propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD). It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories, contributing to discriminative category centers and better classification results. Besides, we introduce a novel preview strategy to dynamically determine how much the student should learn from each sample according to their difficulty. Different from existing methods that treat all samples equally and curriculum learning that simply filters out hard samples, our method assigns a small weight for hard instances as a preview to better guide the student training. Extensive experiments on several challenging datasets, including CIFAR-100 and ImageNet, demonstrate the superiority over state-of-the-art methods.

Preview-based Category Contrastive Learning for Knowledge Distillation

TL;DR

This work proposes a novel preview-based category contrastive learning method for knowledge distillation (PCKD), which first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation.

Abstract

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly investigate the consistency between instance-level feature representation or prediction, which neglects the category-level information and the difficulty of each sample, leading to undesirable performance. To address these issues, we propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD). It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories, contributing to discriminative category centers and better classification results. Besides, we introduce a novel preview strategy to dynamically determine how much the student should learn from each sample according to their difficulty. Different from existing methods that treat all samples equally and curriculum learning that simply filters out hard samples, our method assigns a small weight for hard instances as a preview to better guide the student training. Extensive experiments on several challenging datasets, including CIFAR-100 and ImageNet, demonstrate the superiority over state-of-the-art methods.

Paper Structure

This paper contains 22 sections, 8 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: The motivation of our proposed method. (a) Existing knowledge distillation methods mainly transfer knowledge of features and logits, ignoring the category-level information in the parameters of the fully connected layer. (b) Illustration of our proposed preview-based learning strategy. It dynamically adjusts the difficulties of input instances and gradually increases their learning weights during the training.
  • Figure 2: The overall framework of our proposed PCKD. We first augment samples, extract features and perform feature alignment ($\mathcal{L}_{FA}$), category center alignment ($\mathcal{L}_{CA}$), and category center contrast ($\mathcal{L}_{CC}$). Then our preview strategy can assign dynamic weights to each sample based on its difficulty score.
  • Figure 3: Illustration of multiplication between the feature vectors and weight matrix in the fully connected layer (best viewed in color). Each column vector in the weight matrix is regarded as a category center, representing a specific category. $B$ is the batch size, $K$ is the feature size, and $C$ denotes the total number of categories.
  • Figure 4: Sample images of different difficulties on various datasets. The difficulty score of these images increases from left to right.
  • Figure 5: Effect of the hyperparameters. (a) Effect of varying weight $\beta_{cc}$. (b) Influence of changing weight $\beta_{fa}$. (c) Influence of varying weight $\beta_{ca}$. (d) Effect of varying parameter $\varepsilon$.
  • ...and 3 more figures