Table of Contents
Fetching ...

Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning

Fan Liu, Liang Yao, Chuanyi Zhang, Ting Wu, Xinlei Zhang, Xiruo Jiang, Jun Zhou

TL;DR

This work tackles the challenge of detecting small objects in UAV imagery under real-time constraints. It introduces SIFDAL, a plug-and-play framework with Scale-Invariant Feature Disentangling (SIFD) and Adversarial Feature Learning (AFL) to extract discriminative scale-invariant features from high-resolution FPN layers, aided by height-based supervision and correlation-based disentanglement losses. The approach yields consistent accuracy gains across lightweight detectors and datasets, achieving SoTA performance on several UAV benchmarks, and is complemented by the State-Air multi-modal UAV dataset that includes UAV flight status data. The results demonstrate that focusing on scale-invariant representations in the high-resolution FPN layer improves small-object detection while maintaining efficiency, offering practical impact for real-world UAV applications.

Abstract

Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to improve the single-stage inference accuracy through learning scale-invariant features. Specifically, a Scale-Invariant Feature Disentangling module is designed to disentangle scale-related and scale-invariant features. Then an Adversarial Feature Learning scheme is employed to enhance disentanglement. Finally, scale-invariant features are leveraged for robust UAV-based object detection. Furthermore, we construct a multi-modal UAV object detection dataset, State-Air, which incorporates annotated UAV state parameters. We apply our approach to three lightweight detection frameworks on two benchmark datasets. Extensive experiments demonstrate that our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets. Our code and dataset will be publicly available once the paper is accepted.

Boost UAV-based Ojbect Detection via Scale-Invariant Feature Disentanglement and Adversarial Learning

TL;DR

This work tackles the challenge of detecting small objects in UAV imagery under real-time constraints. It introduces SIFDAL, a plug-and-play framework with Scale-Invariant Feature Disentangling (SIFD) and Adversarial Feature Learning (AFL) to extract discriminative scale-invariant features from high-resolution FPN layers, aided by height-based supervision and correlation-based disentanglement losses. The approach yields consistent accuracy gains across lightweight detectors and datasets, achieving SoTA performance on several UAV benchmarks, and is complemented by the State-Air multi-modal UAV dataset that includes UAV flight status data. The results demonstrate that focusing on scale-invariant representations in the high-resolution FPN layer improves small-object detection while maintaining efficiency, offering practical impact for real-world UAV applications.

Abstract

Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to improve the single-stage inference accuracy through learning scale-invariant features. Specifically, a Scale-Invariant Feature Disentangling module is designed to disentangle scale-related and scale-invariant features. Then an Adversarial Feature Learning scheme is employed to enhance disentanglement. Finally, scale-invariant features are leveraged for robust UAV-based object detection. Furthermore, we construct a multi-modal UAV object detection dataset, State-Air, which incorporates annotated UAV state parameters. We apply our approach to three lightweight detection frameworks on two benchmark datasets. Extensive experiments demonstrate that our approach can effectively improve model accuracy and achieve state-of-the-art (SoTA) performance on two datasets. Our code and dataset will be publicly available once the paper is accepted.
Paper Structure (32 sections, 7 equations, 9 figures, 7 tables)

This paper contains 32 sections, 7 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Comparison of general (VOC) and UAV (VisDrone, State-Air) datasets. The object scale is normalized by the ratio of the object's actual area to the source image. Proportion represents the percentage of objects of each scale in the overall dataset. Most objects in UAV datasets tend to be small-scale.
  • Figure 2: Overview of our proposed approach. State-Air is a multi-scene and multi-modal UAV-OD dataset that incorporates UAV state parameters. Our SIFDAL consists of a SIFD module and an AFL training method. SIFD leverages a scale-related feature learner $\mathcal{F}_{learner}$ to extract scale-related features through height level estimation with altitude labels as supervision. Then, it disentangles scale-related and scale-invariant features ($x_{\mathcal{M}_{\theta}}^{rel}$ and $x_{\mathcal{M}_{\theta}}^{inv}$) by minimizing the correlation coefficient $\rho$. AFL is utilized to enhance feature disentanglement by adversarial training. Finally, scale-invariant features are employed to detect objects.
  • Figure 3: Visualization of heat maps in different FPN layers of YOLOv7-L. The resolution of the feature map decreases sequentially from P3 to P5. The detection head for the P3 layer corresponds to the majority of objects in the drone's field of view.
  • Figure 4: (a) Annotation comparison among State-Air, AU-AIR, and SynDrone. Green: labels given by the datasets; Yellow: revision of incorrect labels; Red: missed labels; Blue: negative labels. (b) State-Air's distribution of scene (outer) and object category (inner).
  • Figure 5: (a) Parameter Analysis of $\lambda_1$ and $\lambda_2$. (b) The impact of employing the SIFD module to each layer of the FPN with YOLOv7-L.
  • ...and 4 more figures