Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection
Xiangyi Gao, Danpei Zhao, Bo Yuan, Wentao Li
TL;DR
The paper tackles the problem of information mixture and subtle feature discrepancies in remote sensing object detection by introducing DS2D2, a dual-stream distillation framework that combines spectral decomposition via first-order Haar wavelets with explicit and implicit distillation. Explicit distillation uses Density-Independent Scale Weights to reweight low/high-frequency components, while implicit distillation employs knowledge amplifiers to map subtle feature differences to prediction gaps, with a full-frequency amplifier derived from the teacher head and a high-frequency amplifier for directional cues. Evaluations on DIOR and DOTA show that DS2D2 consistently surpasses state-of-the-art distillation methods for both RetinaNet and Faster R-CNN backbones, particularly improving small/dense object detection. The approach is architecture-agnostic, demonstrates robustness across wavelet bases, and provides practical improvements for lightweight remote sensing detectors, with code to be released.
Abstract
Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillation (DS2D2) for universal remote sensing object detection tasks. Specifically, DS2D2 integrates explicit and implicit distillation grounded in spectral decomposition. Firstly, the first-order wavelet transform is applied for spectral decomposition to preserve the critical spatial characteristics of RSIs. Leveraging this spatial preservation, a Density-Independent Scale Weight (DISW) is designed to address the challenges of dense and small object detection common in RSIs. Secondly, we show implicit knowledge hidden in subtle student-teacher feature discrepancies, which significantly influence predictions when activated by detection heads. This implicit knowledge is extracted via full-frequency and high-frequency amplifiers, which map feature differences to prediction deviations. Extensive experiments on DIOR and DOTA datasets validate the effectiveness of the proposed method. Specifically, on DIOR dataset, DS2D2 achieves improvements of 4.2% in AP50 for RetinaNet and 3.8% in AP50 for Faster R-CNN, outperforming existing distillation approaches. The source code will be available at https://github.com/PolarAid/DS2D2.
