Table of Contents
Fetching ...

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

TL;DR

This work tackles the problem that knowledge distillation for object detection often trusts a potentially imperfect teacher. It introduces the Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET) paradigm, which injects knowledge uncertainty into feature-based KD using Monte Carlo dropout and a residual integration term, enabling the student to explore latent knowledge without extra complexity. The approach is demonstrated to yield SoTA results across multiple detectors and backbones on MS COCO and generalizes well to different KD methods, including both feature-based and logits-based distillation. The proposed method offers robust improvements with minimal computational overhead, making uncertainty-aware KD practical for real-world object detection systems.

Abstract

Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

TL;DR

This work tackles the problem that knowledge distillation for object detection often trusts a potentially imperfect teacher. It introduces the Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET) paradigm, which injects knowledge uncertainty into feature-based KD using Monte Carlo dropout and a residual integration term, enabling the student to explore latent knowledge without extra complexity. The approach is demonstrated to yield SoTA results across multiple detectors and backbones on MS COCO and generalizes well to different KD methods, including both feature-based and logits-based distillation. The proposed method offers robust improvements with minimal computational overhead, making uncertainty-aware KD practical for real-world object detection systems.

Abstract

Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.
Paper Structure (21 sections, 6 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: We chose the popular ResNet-101 GFL framework which trained with 24 epochs and employed multi-scale training on the COCO dataset as the teacher detector. Subsequently, we assess its performance on the training set of the COCO dataset. The heatmap on the left showcases the multi-scale features of the GFL detector, while the middle segment illustrates the prediction results of GFL. On the right, we present the evaluation performance of the GFL.
  • Figure 2: The overview of the proposed UET paradigm.
  • Figure 3: Convergence analysis for the lightweight detectors
  • Figure 4: Visualization of detection results of Our method.