Table of Contents
Fetching ...

Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection

Xiangyi Gao, Danpei Zhao, Bo Yuan, Wentao Li

TL;DR

The paper tackles the problem of information mixture and subtle feature discrepancies in remote sensing object detection by introducing DS2D2, a dual-stream distillation framework that combines spectral decomposition via first-order Haar wavelets with explicit and implicit distillation. Explicit distillation uses Density-Independent Scale Weights to reweight low/high-frequency components, while implicit distillation employs knowledge amplifiers to map subtle feature differences to prediction gaps, with a full-frequency amplifier derived from the teacher head and a high-frequency amplifier for directional cues. Evaluations on DIOR and DOTA show that DS2D2 consistently surpasses state-of-the-art distillation methods for both RetinaNet and Faster R-CNN backbones, particularly improving small/dense object detection. The approach is architecture-agnostic, demonstrates robustness across wavelet bases, and provides practical improvements for lightweight remote sensing detectors, with code to be released.

Abstract

Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS2D2) for universal remote sensing object detection tasks. Specifically, DS2D2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS2D2 achieves improvements of 4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.

Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection

TL;DR

The paper tackles the problem of information mixture and subtle feature discrepancies in remote sensing object detection by introducing DS2D2, a dual-stream distillation framework that combines spectral decomposition via first-order Haar wavelets with explicit and implicit distillation. Explicit distillation uses Density-Independent Scale Weights to reweight low/high-frequency components, while implicit distillation employs knowledge amplifiers to map subtle feature differences to prediction gaps, with a full-frequency amplifier derived from the teacher head and a high-frequency amplifier for directional cues. Evaluations on DIOR and DOTA show that DS2D2 consistently surpasses state-of-the-art distillation methods for both RetinaNet and Faster R-CNN backbones, particularly improving small/dense object detection. The approach is architecture-agnostic, demonstrates robustness across wavelet bases, and provides practical improvements for lightweight remote sensing detectors, with code to be released.

Abstract

Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS2D2) for universal remote sensing object detection tasks. Specifically, DS2D2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS2D2 achieves improvements of 4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.

Paper Structure

This paper contains 16 sections, 11 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: An overview of conventional feature distillation versus our DS2D2. Conventional methods struggle with semantic confusion and neglect implicit knowledge. We employ wavelet transforms for spectral decomposition. Besides, combining explicit and implicit distillation enables comprehensive learning.
  • Figure 2: Overview of our DS2D2, with the knowledge amplifier structure exemplified by RetinaNet. Feature maps are generated from the input image through the networks. Feature maps are decoupled into high-frequency and low-frequency components via the wavelet transform to separate entangled semantic information in remote sensing. DISW applies weighting using spatial features preserved by the wavelet transform to compute the explicit distillation loss $L^{ex}$. To capture implicit knowledge, feature maps of different spectral components are fed into corresponding knowledge amplifiers to obtain predictions, which are then used to calculate the implicit distillation loss $L^{im}$.
  • Figure 3: Schematic of spectral decomposition. Remote sensing feature maps exhibit vast and complex semantics, posing significant learning challenges. Spectral decomposition via the wavelet transform enables models to distinguish information effectively.
  • Figure 4: Schematic diagram of implicit distillation. Subtle discrepancies between teacher and student feature maps, processed by knowledge amplifiers, can significantly impact prediction results. Distilling these amplified prediction differences and leveraging backpropagation to optimize feature map learning enhances distillation performance.
  • Figure 5: Visual detection results on DOTA dataset. Green and red boxes represent correct and incorrect predictions, respectively. Our DS2D2 achieves higher detection accuracy, particularly for small and ambiguous objects, while surpassing the teacher model in specific scenarios.
  • ...and 3 more figures