Preview-based Category Contrastive Learning for Knowledge Distillation

Muhe Ding; Jianlong Wu; Xue Dong; Xiaojie Li; Pengda Qin; Tian Gan; Liqiang Nie

Preview-based Category Contrastive Learning for Knowledge Distillation

Muhe Ding, Jianlong Wu, Xue Dong, Xiaojie Li, Pengda Qin, Tian Gan, Liqiang Nie

TL;DR

This work proposes a novel preview-based category contrastive learning method for knowledge distillation (PCKD), which first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation.

Abstract

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly investigate the consistency between instance-level feature representation or prediction, which neglects the category-level information and the difficulty of each sample, leading to undesirable performance. To address these issues, we propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD). It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories, contributing to discriminative category centers and better classification results. Besides, we introduce a novel preview strategy to dynamically determine how much the student should learn from each sample according to their difficulty. Different from existing methods that treat all samples equally and curriculum learning that simply filters out hard samples, our method assigns a small weight for hard instances as a preview to better guide the student training. Extensive experiments on several challenging datasets, including CIFAR-100 and ImageNet, demonstrate the superiority over state-of-the-art methods.

Preview-based Category Contrastive Learning for Knowledge Distillation

TL;DR

Abstract

Preview-based Category Contrastive Learning for Knowledge Distillation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)