SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection
Huaxiang Zhang, Hao Zhang, Aoran Mei, Zhongxue Gan, Guo-Niu Zhu
TL;DR
This work tackles the persistent challenge of small object detection in DETR-like architectures by introducing SO-DETR, which fuses spatial and frequency-domain features through a dual-domain hybrid encoder, optimizes query allocation with an Expanded-IoU based mechanism, and leverages knowledge distillation to maintain efficiency with a lightweight backbone. The method demonstrates competitive accuracy gains on UAV-focused benchmarks VisDrone-2019-DET and UAVVaste while reducing computational overhead, and ablations confirm the complementary contributions of each component. By targeting final decoder outputs in the distillation process and employing a linear decay schedule, SO-DETR achieves effective knowledge transfer and improved small-object localization. Overall, the approach offers a practical, efficient path for real-time small-object detection in aerial imagery and points to balancing high-resolution feature extraction with semantic understanding for large objects as a future priority.
Abstract
Detection Transformer-based methods have achieved significant advancements in general object detection. However, challenges remain in effectively detecting small objects. One key difficulty is that existing encoders struggle to efficiently fuse low-level features. Additionally, the query selection strategies are not effectively tailored for small objects. To address these challenges, this paper proposes an efficient model, Small Object Detection Transformer (SO-DETR). The model comprises three key components: a dual-domain hybrid encoder, an enhanced query selection mechanism, and a knowledge distillation strategy. The dual-domain hybrid encoder integrates spatial and frequency domains to fuse multi-scale features effectively. This approach enhances the representation of high-resolution features while maintaining relatively low computational overhead. The enhanced query selection mechanism optimizes query initialization by dynamically selecting high-scoring anchor boxes using expanded IoU, thereby improving the allocation of query resources. Furthermore, by incorporating a lightweight backbone network and implementing a knowledge distillation strategy, we develop an efficient detector for small objects. Experimental results on the VisDrone-2019-DET and UAVVaste datasets demonstrate that SO-DETR outperforms existing methods with similar computational demands. The project page is available at https://github.com/ValiantDiligent/SO_DETR.
