Table of Contents
Fetching ...

Learnable Instance Attention Filtering for Adaptive Detector Distillation

Chen Liu, Qizhen Lan, Zhicheng Ding, Xinyu Chu, Qing Tian

Abstract

As deep vision models grow increasingly complex to achieve higher performance, deployment efficiency has become a critical concern. Knowledge distillation (KD) mitigates this issue by transferring knowledge from large teacher models to compact student models. While many feature-based KD methods rely on spatial filtering to guide distillation, they typically treat all object instances uniformly, ignoring instance-level variability. Moreover, existing attention filtering mechanisms are typically heuristic or teacher-driven, rather than learned with the student. To address these limitations, we propose Learnable Instance Attention Filtering for Adaptive Detector Distillation (LIAF-KD), a novel framework that introduces learnable instance selectors to dynamically evaluate and reweight instance importance during distillation. Notably, the student contributes to this process based on its evolving learning state. Experiments on the KITTI and COCO datasets demonstrate consistent improvements, with a 2% gain on a GFL ResNet-50 student without added complexity, outperforming state-of-the-art methods.

Learnable Instance Attention Filtering for Adaptive Detector Distillation

Abstract

As deep vision models grow increasingly complex to achieve higher performance, deployment efficiency has become a critical concern. Knowledge distillation (KD) mitigates this issue by transferring knowledge from large teacher models to compact student models. While many feature-based KD methods rely on spatial filtering to guide distillation, they typically treat all object instances uniformly, ignoring instance-level variability. Moreover, existing attention filtering mechanisms are typically heuristic or teacher-driven, rather than learned with the student. To address these limitations, we propose Learnable Instance Attention Filtering for Adaptive Detector Distillation (LIAF-KD), a novel framework that introduces learnable instance selectors to dynamically evaluate and reweight instance importance during distillation. Notably, the student contributes to this process based on its evolving learning state. Experiments on the KITTI and COCO datasets demonstrate consistent improvements, with a 2% gain on a GFL ResNet-50 student without added complexity, outperforming state-of-the-art methods.

Paper Structure

This paper contains 11 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed LIAF-KD framework. It employs instance selectors to dynamically reweight instances during distillation, guided by both the teacher’s knowledge and the student’s learning dynamics.
  • Figure 2: Detection visualization of different models. Column (a) and (b) are input images and ground truth. Columns (c), (d), and (e) are the detection results of the student baseline, MasKD, and LIAF-KD models, respectively.
  • Figure 3: Grad-CAM Attention of different models. The colors represent attention intensity, with red indicating the highest level and blue the lowest. "Base": the student baseline.