Table of Contents
Fetching ...

Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios

Xinlei Huang, Jialiang Tang, Xubin Zheng, Jinjia Zhou, Wenxin Yu, Ning Jiang

TL;DR

This paper proposes a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors, and rectify the biased predictions produced by the teacher network, particularly focusing on the tail categories.

Abstract

Knowledge Distillation (KD) transfers knowledge from a large pre-trained teacher network to a compact and efficient student network, making it suitable for deployment on resource-limited media terminals. However, traditional KD methods require balanced data to ensure robust training, which is often unavailable in practical applications. In such scenarios, a few head categories occupy a substantial proportion of examples. This imbalance biases the trained teacher network towards the head categories, resulting in severe performance degradation on the less represented tail categories for both the teacher and student networks. In this paper, we propose a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors. Furthermore, we rectify the biased predictions produced by the teacher network, particularly focusing on the tail categories. Consequently, the teacher network can provide balanced and accurate knowledge to train a reliable student network. Intensive experiments conducted on various long-tailed datasets demonstrate that our KRDistill can effectively train reliable student networks in realistic scenarios of data imbalance.

Learn from Balance: Rectifying Knowledge Transfer for Long-Tailed Scenarios

TL;DR

This paper proposes a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors, and rectify the biased predictions produced by the teacher network, particularly focusing on the tail categories.

Abstract

Knowledge Distillation (KD) transfers knowledge from a large pre-trained teacher network to a compact and efficient student network, making it suitable for deployment on resource-limited media terminals. However, traditional KD methods require balanced data to ensure robust training, which is often unavailable in practical applications. In such scenarios, a few head categories occupy a substantial proportion of examples. This imbalance biases the trained teacher network towards the head categories, resulting in severe performance degradation on the less represented tail categories for both the teacher and student networks. In this paper, we propose a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors. Furthermore, we rectify the biased predictions produced by the teacher network, particularly focusing on the tail categories. Consequently, the teacher network can provide balanced and accurate knowledge to train a reliable student network. Intensive experiments conducted on various long-tailed datasets demonstrate that our KRDistill can effectively train reliable student networks in realistic scenarios of data imbalance.
Paper Structure (11 sections, 8 equations, 3 figures, 3 tables)

This paper contains 11 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) Comparison of example distributions in balanced datasets and long-tailed data in practice scenarios. (b) Top-1 error rate of the teacher network (ResNet-110) per category on the CIFAR10-LT dataset.
  • Figure 2: Visualization of (a) feature representations generated by the imbalanced teacher network, (b) modified teacher feature representations using our method on the CIFAR10-LT dataset. The number of examples in each category is marked on the right.
  • Figure 3: The framework diagram of the proposed Knowledge Rectification Distillation. Ideal feature representations rectify imbalanced teacher features, transferring knowledge of representation with clear class boundaries to the student network. Misclassified teacher predictions are adaptively corrected and rebalanced, preventing potential misleading of the student network by imbalanced teacher prediction knowledge.