Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation
Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, Jingzhi Li, Xiaochun Cao
TL;DR
This work addresses a gap in knowledge distillation for object detection: standard KD concentrates on human-aligned knowledge, missing the teacher’s counter-intuitive perceptions. It proposes inconsistent knowledge distillation (IKD) with two augmentation-based mechanisms—frequency-aware, sample-specific data augmentation and adversarial feature augmentation—to mine and transfer non-human perceptual knowledge from the teacher. Evaluations on COCO-2017 across one-stage, two-stage, and anchor-free detectors show consistent mAP improvements, with gains reaching up to about 1.0 mAP and a top result of 43.0 mAP on RepPoints, demonstrating the practicality of enriching KD via data augmentation. The findings suggest data augmentation is a general, effective tool to broaden knowledge transfer in KD for object detection, with potential extensions to fidelity and defense tasks.
Abstract
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insufficient distillation and sub-optimal performance. In this paper, we propose inconsistent knowledge distillation (IKD), which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. We start by considering the teacher model's counter-intuitive perceptions of frequency and non-robust features. Unlike previous works that exploit fine-grained features or introduce additional regularizations, we extract inconsistent knowledge by providing diverse input using data augmentation. Specifically, we propose a sample-specific data augmentation to transfer the teacher model's ability in capturing distinct frequency components and suggest an adversarial feature augmentation to extract the teacher model's perceptions of non-robust features in the data. Extensive experiments demonstrate the effectiveness of our method which outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors (at most +1.0 mAP). Our codes will be made available at \url{https://github.com/JWLiang007/IKD.git}.
