Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning
Fan Liu, Liang Yao, Chuanyi Zhang, Ting Wu, Xinlei Zhang, Xiruo Jiang, Jun Zhou
TL;DR
This work tackles the challenge of detecting small objects in UAV imagery under real-time constraints. It introduces SIFDAL, a plug-and-play framework with Scale-Invariant Feature Disentangling (SIFD) and Adversarial Feature Learning (AFL) to extract discriminative scale-invariant features from high-resolution FPN layers, aided by height-based supervision and correlation-based disentanglement losses. The approach yields consistent accuracy gains across lightweight detectors and datasets, achieving SoTA performance on several UAV benchmarks, and is complemented by the State-Air multi-modal UAV dataset that includes UAV flight status data. The results demonstrate that focusing on scale-invariant representations in the high-resolution FPN layer improves small-object detection while maintaining efficiency, offering practical impact for real-world UAV applications.
Abstract
Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to improve the single-stage inference accuracy through learning scale-invariant features. Specifically, a Scale-Invariant Feature Disentangling module is designed to disentangle scale-related and scale-invariant features. Then an Adversarial Feature Learning scheme is employed to enhance disentanglement. Finally, scale-invariant features are leveraged for robust UAV-based object detection. Furthermore, we construct a multi-modal UAV object detection dataset, State-Air, which incorporates annotated UAV state parameters. We apply our approach to three lightweight detection frameworks on two benchmark datasets. Extensive experiments demonstrate that our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets. Our code and dataset will be publicly available once the paper is accepted.
