Table of Contents
Fetching ...

Task Integration Distillation for Object Detectors

Hai Su, ZhenWen Jian, Songsen Yu

TL;DR

This work tackles the mismatch in knowledge distillation for object detectors, where prior methods primarily target classification and neglect regression. It introduces Task Integration Distillation (TID), a three-module framework (DIEM, LDAM, SFDM) that jointly models classification and localization, assesses the learner's current state, and selectively decouples feature maps by value to guide distillation. Core contributions include a dual-task importance score, learning-condition–aware area selection, and a three-way feature decoupling scheme that emphasizes high-value regions while attenuating low-value areas. Empirical results on MS COCO 2017 and VOC across GFL and ATSS show consistent gains over existing KD approaches, with ablations confirming the value of balancing tasks, using detector-consistent localization signals, and leveraging learning dynamics. Overall, TID provides a practical, generalizable route to more efficient and accurate lightweight detectors by faithfully reflecting the dual-task learning condition during distillation.

Abstract

Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial understanding of the object detector's comprehensive task, resulting in skewed estimations and potentially adverse effects. Therefore, we propose a knowledge distillation method that addresses both the classification and regression tasks, incorporating a task significance strategy. By evaluating the importance of features based on the output of the detector's two sub-tasks, our approach ensures a balanced consideration of both classification and regression tasks in object detection. Drawing inspiration from real-world teaching processes and the definition of learning condition, we introduce a method that focuses on both key and weak areas. By assessing the value of features for knowledge distillation based on their importance differences, we accurately capture the current model's learning situation. This method effectively prevents the issue of biased predictions about the model's learning reality caused by an incomplete utilization of the detector's outputs.

Task Integration Distillation for Object Detectors

TL;DR

This work tackles the mismatch in knowledge distillation for object detectors, where prior methods primarily target classification and neglect regression. It introduces Task Integration Distillation (TID), a three-module framework (DIEM, LDAM, SFDM) that jointly models classification and localization, assesses the learner's current state, and selectively decouples feature maps by value to guide distillation. Core contributions include a dual-task importance score, learning-condition–aware area selection, and a three-way feature decoupling scheme that emphasizes high-value regions while attenuating low-value areas. Empirical results on MS COCO 2017 and VOC across GFL and ATSS show consistent gains over existing KD approaches, with ablations confirming the value of balancing tasks, using detector-consistent localization signals, and leveraging learning dynamics. Overall, TID provides a practical, generalizable route to more efficient and accurate lightweight detectors by faithfully reflecting the dual-task learning condition during distillation.

Abstract

Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial understanding of the object detector's comprehensive task, resulting in skewed estimations and potentially adverse effects. Therefore, we propose a knowledge distillation method that addresses both the classification and regression tasks, incorporating a task significance strategy. By evaluating the importance of features based on the output of the detector's two sub-tasks, our approach ensures a balanced consideration of both classification and regression tasks in object detection. Drawing inspiration from real-world teaching processes and the definition of learning condition, we introduce a method that focuses on both key and weak areas. By assessing the value of features for knowledge distillation based on their importance differences, we accurately capture the current model's learning situation. This method effectively prevents the issue of biased predictions about the model's learning reality caused by an incomplete utilization of the detector's outputs.
Paper Structure (21 sections, 12 equations, 5 figures, 5 tables)

This paper contains 21 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Details the Task Integration Distillation (TID) approach, which includes the Dual-Task Importance Evaluation Module for quantifying model output results, the Learning Dynamics Assessment Module that reflects the model's current learning condition based on output value, and the Selective Feature Decoupling Module that decouples feature maps according to the learning condition. For concise representation of the methodological flow, only single-level features and predictions in the FPN are shown here.
  • Figure 2: details the Dual-Task Importance Evaluation Module. For clarity, we display operations on a single feature map. The red regions represent the currently processed sub-regions of the feature map, with each sub-region receiving an area importance score based on classification and regression outputs, as well as the intrinsic settings of the object detection model.
  • Figure 3: PR curves and error analysis between different models. 'Correct': Predictions with the correct label and an IOU greater than 0.5; 'Oth': False positives between classes, i.e., predictions with incorrect labels; 'FN': Missed detections; 'Sim': Predictions with incorrect labels but correct supercategories; 'BG': False alarms predicted in background areas; 'Loc': Predictions with the correct label but an IOU between 0.1 and 0.5.
  • Figure 4: from left to right, displays the feature areas selected based on the model's learning condition across different FPN layers. The far left represents the lowest level features of the FPN, while the far right corresponds to the highest level features of the FPN.
  • Figure 5: Qualitative analysis of the COCO2017 dataset using the baseline detector and the GFL-ResNet50 detector processed with our method.