Table of Contents
Fetching ...

Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning

Chang Xu, Ruixiang Zhang, Wen Yang, Haoran Zhu, Fang Xu, Jian Ding, Gui-Song Xia

TL;DR

This work tackles oriented tiny object detection, a setting with extreme scale ($mean~object~size~$10.6$^{2}$ pixels) and arbitrary orientation, by introducing AI-TOD-R, a challenging dataset, a corresponding benchmark, and a Dynamic Coarse-to-Fine Learning (DCFL) pipeline. DCFL combines a dynamic Prior Capturing Block to adapt priors to object extents and a two-stage sampling regime—coarse positive sampling across scales and a finer posterior matching using a Dynamic Gaussian Mixture Model—to overcome bias against tiny objects. Across eight heterogeneous benchmarks, DCFL delivers state-of-the-art accuracy without additional inference cost, confirming its versatility for one-stage and two-stage detectors and its effectiveness under fully-supervised and label-efficient settings; notable gains include improvements in $AP_{0.5}$ and robustness to extreme object sizes. The practical impact lies in enabling reliable detection of densely packed, orientation-variant tiny objects in aerial and remote sensing imagery, with open-source code facilitating adoption and further research; future work may extend to open-world settings, multi-modality, and foundation-model integration. $AP$ and $IoU$-oriented metrics are used throughout, and key ideas are expressed through $DGMM$, $GJSD$, and dynamic priors that adapt during training.

Abstract

Detecting oriented tiny objects, which are limited in appearance information yet prevalent in real-world applications, remains an intricate and under-explored problem. To address this, we systemically introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study. Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets. Based on AI-TOD-R, we present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches. Through investigation, we identify a learning bias presents across various learning pipelines: confident objects become increasingly confident, while vulnerable oriented tiny objects are further marginalized, hindering their detection performance. To mitigate this issue, we propose a Dynamic Coarse-to-Fine Learning (DCFL) scheme to achieve unbiased learning. DCFL dynamically updates prior positions to better align with the limited areas of oriented tiny objects, and it assigns samples in a way that balances both quantity and quality across different object shapes, thus mitigating biases in prior settings and sample selection. Extensive experiments across eight challenging object detection datasets demonstrate that DCFL achieves state-of-the-art accuracy, high efficiency, and remarkable versatility. The dataset, benchmark, and code are available at https://chasel-tsui.github.io/AI-TOD-R/.

Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning

TL;DR

This work tackles oriented tiny object detection, a setting with extreme scale (10.6 pixels) and arbitrary orientation, by introducing AI-TOD-R, a challenging dataset, a corresponding benchmark, and a Dynamic Coarse-to-Fine Learning (DCFL) pipeline. DCFL combines a dynamic Prior Capturing Block to adapt priors to object extents and a two-stage sampling regime—coarse positive sampling across scales and a finer posterior matching using a Dynamic Gaussian Mixture Model—to overcome bias against tiny objects. Across eight heterogeneous benchmarks, DCFL delivers state-of-the-art accuracy without additional inference cost, confirming its versatility for one-stage and two-stage detectors and its effectiveness under fully-supervised and label-efficient settings; notable gains include improvements in and robustness to extreme object sizes. The practical impact lies in enabling reliable detection of densely packed, orientation-variant tiny objects in aerial and remote sensing imagery, with open-source code facilitating adoption and further research; future work may extend to open-world settings, multi-modality, and foundation-model integration. and -oriented metrics are used throughout, and key ideas are expressed through , , and dynamic priors that adapt during training.

Abstract

Detecting oriented tiny objects, which are limited in appearance information yet prevalent in real-world applications, remains an intricate and under-explored problem. To address this, we systemically introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study. Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets. Based on AI-TOD-R, we present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches. Through investigation, we identify a learning bias presents across various learning pipelines: confident objects become increasingly confident, while vulnerable oriented tiny objects are further marginalized, hindering their detection performance. To mitigate this issue, we propose a Dynamic Coarse-to-Fine Learning (DCFL) scheme to achieve unbiased learning. DCFL dynamically updates prior positions to better align with the limited areas of oriented tiny objects, and it assigns samples in a way that balances both quantity and quality across different object shapes, thus mitigating biases in prior settings and sample selection. Extensive experiments across eight challenging object detection datasets demonstrate that DCFL achieves state-of-the-art accuracy, high efficiency, and remarkable versatility. The dataset, benchmark, and code are available at https://chasel-tsui.github.io/AI-TOD-R/.

Paper Structure

This paper contains 25 sections, 14 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: This paper systemically introduces the challenging task of oriented tiny object detection, with the AI-TOD-R dataset, benchmark, and a dynamic coarse-to-fine learning pipeline. Upper: Typical annotation examples from AI-TOD-R and detection paradigms covered by this benchmark, where "L.", "U.", "S. L.", and "C. L." denote labelled, unlabelled, sparsely labelled, and coarsely labelled images, respectively. Lower: A comparison of learning paradigms for oriented object detection. Compared to prior arts (left), our proposed pipeline (right) mitigates the learning bias against oriented tiny objects with a dynamically updated prior and a coarse-to-fine sample learning scheme.
  • Figure 2: Statistical analysis of the AI-TOD-R. From left to right, we show the dataset's object size distribution, object angle distribution, object number per image distribution, and class size distribution, respectively. The box plot of "Class Size Distribution" shows the object's absolute size's mean value and standard deviation within each class.
  • Figure 3: The labelling process of the AI-TOD-R. The coarse labels are automatically generated by H2RBox-v2, and final labels are obtained by manual labelling and verification.
  • Figure 4: Visualization of annotations in AI-TOD-R. Compared to AI-TOD-v2, using oriented bounding boxes to represent tiny objects can significantly reduce back noise, and this advantage is particularly obvious in densely arranged scenarios. In addition to the extremely tiny object size, AI-TOD-R introduces other challenges like dense arrangement, weak feature representation, and imbalanced class distributions.
  • Figure 5: An illustration of the sample learning bias. SOOD sood_cvpr_2023 is trained with 10% labels under the semi-supervised object detection pipeline.
  • ...and 3 more figures