Table of Contents
Fetching ...

Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation

Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, Jingzhi Li, Xiaochun Cao

TL;DR

This work addresses a gap in knowledge distillation for object detection: standard KD concentrates on human-aligned knowledge, missing the teacher’s counter-intuitive perceptions. It proposes inconsistent knowledge distillation (IKD) with two augmentation-based mechanisms—frequency-aware, sample-specific data augmentation and adversarial feature augmentation—to mine and transfer non-human perceptual knowledge from the teacher. Evaluations on COCO-2017 across one-stage, two-stage, and anchor-free detectors show consistent mAP improvements, with gains reaching up to about 1.0 mAP and a top result of 43.0 mAP on RepPoints, demonstrating the practicality of enriching KD via data augmentation. The findings suggest data augmentation is a general, effective tool to broaden knowledge transfer in KD for object detection, with potential extensions to fidelity and defense tasks.

Abstract

Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insufficient distillation and sub-optimal performance. In this paper, we propose inconsistent knowledge distillation (IKD), which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. We start by considering the teacher model's counter-intuitive perceptions of frequency and non-robust features. Unlike previous works that exploit fine-grained features or introduce additional regularizations, we extract inconsistent knowledge by providing diverse input using data augmentation. Specifically, we propose a sample-specific data augmentation to transfer the teacher model's ability in capturing distinct frequency components and suggest an adversarial feature augmentation to extract the teacher model's perceptions of non-robust features in the data. Extensive experiments demonstrate the effectiveness of our method which outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors (at most +1.0 mAP). Our codes will be made available at \url{https://github.com/JWLiang007/IKD.git}.

Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation

TL;DR

This work addresses a gap in knowledge distillation for object detection: standard KD concentrates on human-aligned knowledge, missing the teacher’s counter-intuitive perceptions. It proposes inconsistent knowledge distillation (IKD) with two augmentation-based mechanisms—frequency-aware, sample-specific data augmentation and adversarial feature augmentation—to mine and transfer non-human perceptual knowledge from the teacher. Evaluations on COCO-2017 across one-stage, two-stage, and anchor-free detectors show consistent mAP improvements, with gains reaching up to about 1.0 mAP and a top result of 43.0 mAP on RepPoints, demonstrating the practicality of enriching KD via data augmentation. The findings suggest data augmentation is a general, effective tool to broaden knowledge transfer in KD for object detection, with potential extensions to fidelity and defense tasks.

Abstract

Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insufficient distillation and sub-optimal performance. In this paper, we propose inconsistent knowledge distillation (IKD), which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. We start by considering the teacher model's counter-intuitive perceptions of frequency and non-robust features. Unlike previous works that exploit fine-grained features or introduce additional regularizations, we extract inconsistent knowledge by providing diverse input using data augmentation. Specifically, we propose a sample-specific data augmentation to transfer the teacher model's ability in capturing distinct frequency components and suggest an adversarial feature augmentation to extract the teacher model's perceptions of non-robust features in the data. Extensive experiments demonstrate the effectiveness of our method which outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors (at most +1.0 mAP). Our codes will be made available at \url{https://github.com/JWLiang007/IKD.git}.
Paper Structure (19 sections, 7 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 7 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: KD for object detection can be viewed as a combined distillation of AI knowledge and human knowledge. Since the AI model perceives the data differently from human, we hypothesize that the model may hold informative knowledge derived from its counter-intuitive perceptions and investigate inconsistent knowledge distillation.
  • Figure 2: Detection results of high and low frequency counterparts. White dashed bounding boxes indicate missed predictions. The object detector fails to detect small objects in the low frequency counterpart \ref{['fig: Detection Results of Low frequency counterpart']} whereas it fails to detect larger objects in the high frequency counterpart \ref{['fig: Detection Results of High frequency counterpart']}.
  • Figure 3: Overall framework of our augmentation method. We propose a sample-specific data augmentation to distill inconsistent knowledge concerning frequency by enhancing distinct frequency components based on the size of objects. We propose an adversarial feature augmentation to distill inconsistent knowledge concerning non-robust features by imitating adversarial feature maps of the teacher model.
  • Figure 4: Fourier spectrum of clean images \ref{['fig: Fourier Spectrum of Clean Images']} and images augmented with flipping \ref{['fig: Fourier Spectrum of Flipped Images']}, cropping \ref{['fig: Fourier Spectrum of Cropped Images']} and Gaussian noise corruption \ref{['fig: Fourier Spectrum of Gaussian noise corrupted Images']}, respectively. Flipping does not modify frequency components. Cropping makes images more concentrated in low frequency whereas Gaussian noise introduces more high frequency components compared to clean images.
  • Figure 5: Visualizations of detection results of different models. Red dashed bounding boxes indicate missing predictions.
  • ...and 2 more figures