Task Integration Distillation for Object Detectors
Hai Su, ZhenWen Jian, Songsen Yu
TL;DR
This work tackles the mismatch in knowledge distillation for object detectors, where prior methods primarily target classification and neglect regression. It introduces Task Integration Distillation (TID), a three-module framework (DIEM, LDAM, SFDM) that jointly models classification and localization, assesses the learner's current state, and selectively decouples feature maps by value to guide distillation. Core contributions include a dual-task importance score, learning-condition–aware area selection, and a three-way feature decoupling scheme that emphasizes high-value regions while attenuating low-value areas. Empirical results on MS COCO 2017 and VOC across GFL and ATSS show consistent gains over existing KD approaches, with ablations confirming the value of balancing tasks, using detector-consistent localization signals, and leveraging learning dynamics. Overall, TID provides a practical, generalizable route to more efficient and accurate lightweight detectors by faithfully reflecting the dual-task learning condition during distillation.
Abstract
Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial understanding of the object detector's comprehensive task, resulting in skewed estimations and potentially adverse effects. Therefore, we propose a knowledge distillation method that addresses both the classification and regression tasks, incorporating a task significance strategy. By evaluating the importance of features based on the output of the detector's two sub-tasks, our approach ensures a balanced consideration of both classification and regression tasks in object detection. Drawing inspiration from real-world teaching processes and the definition of learning condition, we introduce a method that focuses on both key and weak areas. By assessing the value of features for knowledge distillation based on their importance differences, we accurately capture the current model's learning situation. This method effectively prevents the issue of biased predictions about the model's learning reality caused by an incomplete utilization of the detector's outputs.
