Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Longrong Yang; Xianpan Zhou; Xuewei Li; Liang Qiao; Zheyang Li; Ziwei Yang; Gaoang Wang; Xi Li

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Longrong Yang, Xianpan Zhou, Xuewei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li

TL;DR

A novel distillation method with cross-task consistent protocols, tailored for the dense object detection, and an IoU-based Localization Distillation Loss that is free from specific network structures and can be compared with existing localization distillation losses are proposed.

Abstract

Knowledge distillation (KD) has shown potential for learning compact models in dense object detection. However, the commonly used softmax-based distillation ignores the absolute classification scores for individual categories. Thus, the optimum of the distillation loss does not necessarily lead to the optimal student classification scores for dense object detectors. This cross-task protocol inconsistency is critical, especially for dense object detectors, since the foreground categories are extremely imbalanced. To address the issue of protocol differences between distillation and classification, we propose a novel distillation method with cross-task consistent protocols, tailored for the dense object detection. For classification distillation, we address the cross-task protocol inconsistency problem by formulating the classification logit maps in both teacher and student models as multiple binary-classification maps and applying a binary-classification distillation loss to each map. For localization distillation, we design an IoU-based Localization Distillation Loss that is free from specific network structures and can be compared with existing localization distillation losses. Our proposed method is simple but effective, and experimental results demonstrate its superiority over existing methods. Code is available at https://github.com/TinyTigerPan/BCKD.

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

TL;DR

Abstract

Paper Structure (29 sections, 10 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 29 sections, 10 equations, 6 figures, 12 tables, 1 algorithm.

Introduction
Related Works
Object Detection
Knowledge Distillation
Methodology
Overview
Binary Classification Distillation Loss
IoU-based Localization Distillation Loss
Total Distillation Loss
Experimental and Results
Datasets and Evaluation Metrics
Main Results
Ablation Analysis
Conclusion
Acknowledgements.
...and 14 more sections

Figures (6)

Figure 1: (a) In dense object detection, different samples exhibit inter-sample differences in their classification score sums on various positions on dense maps, which is significantly different from those in image classification. (b) The cross-task protocol inconsistency problem arises in dense object detection due to the mismatch between Sigmoid protocol used in this task and Softmax protocol used in classification distillation. Specifically, when classification distillation loss equals 0, inconsistencies emerge between the scores of the student and teacher models in dense object detection.
Figure 2: Distillation pipeline of our method. We leverage two novel distillation losses tailored for the object detection task. $(i)$ Binary Classification Distillation Loss $\mathcal{L}_{cls}^{dis}$, which represents classification logit maps as multiple binary-classification maps and distills classification knowledge through a distillation loss similar to binary cross entropy. $(ii)$ IoU-based Localization Distillation Loss $\mathcal{L}_{loc}^{dis}$, which transfers localization knowledge from teacher models to student models by computing the IoUs between predicted bounding boxes from both models and using the IoU loss. Best viewed in color.
Figure 3: Visualization of L1 error summation of the classification score after Sigmoid between the teacher (GFocal-Res101) and the student (GFocal-Res50) at different levels of the Feature Pyramid Network (FPN). We can observe that our proposed method achieves a significant reduction in errors for almost all locations compared to the state-of-the-art method LD LD. To better observe subtle differences, we bound the margin of error between 0 and 0.4. Darker is better. Best viewed in color.
Figure 4: Error analysis conducted using the TIDE toolbox bolya2020tide. The decrease in average precision (dAP) resulting from two types of errors (i.e., Cls, Loc) bolya2020tide is reported. The student model without any distillation losses is denoted as "Baseline", while the use of Binary Classification Distillation Loss and the application of IoU-based Localization Distillation Loss are denoted as "BCDL" and "ILDL", respectively.
Figure 5: Visualization of intermediate training phases on GFocal-Res18, GFocal-R34 and GFocal-R50.
...and 1 more figures

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

TL;DR

Abstract

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)