Table of Contents
Fetching ...

Similarity Distance-Based Label Assignment for Tiny Object Detection

Shuohao Shi, Qiang Fang, Tong Zhao, Xin Xu

TL;DR

This work tackles the difficulty of tiny object detection by rethinking label assignment with a hyperparameter-free Similarity Distance (SimD) that jointly captures location and shape similarity between bounding boxes. SimD embeds adaptive normalization via dataset-derived parameters and replaces IoU in both label assignment (MaxSimDAssigner) and NMS, enabling better high-quality positive sampling without inflating false positives. Extensive experiments across AI-TOD, AI-TODv2, VisDrone2019, and SODA-D show state-of-the-art gains, particularly for very tiny objects, with substantial AP improvements over traditional IoU-based methods. The approach is architecture-agnostic for anchor-based detectors and comes with publicly available code, highlighting practical impact for robust tiny-object detection in diverse scenes.

Abstract

Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase the number of positive samples and have some fixed hyperparameters need to set. However, more positive samples may not necessarily lead to better detection results, in fact, excessive positive samples may lead to more false positives. In this paper, we introduce a simple but effective strategy named the Similarity Distance (SimD) to evaluate the similarity between bounding boxes. This proposed strategy not only considers both location and shape similarity but also learns hyperparameters adaptively, ensuring that it can adapt to different datasets and various object sizes in a dataset. Our approach can be simply applied in common anchor-based detectors in place of the IoU for label assignment and Non Maximum Suppression (NMS). Extensive experiments on four mainstream tiny object detection datasets demonstrate superior performance of our method, especially, 1.8 AP points and 4.1 AP points of very tiny higher than the state-of-the-art competitors on AI-TOD. Code is available at: \url{https://github.com/cszzshi/SimD}.

Similarity Distance-Based Label Assignment for Tiny Object Detection

TL;DR

This work tackles the difficulty of tiny object detection by rethinking label assignment with a hyperparameter-free Similarity Distance (SimD) that jointly captures location and shape similarity between bounding boxes. SimD embeds adaptive normalization via dataset-derived parameters and replaces IoU in both label assignment (MaxSimDAssigner) and NMS, enabling better high-quality positive sampling without inflating false positives. Extensive experiments across AI-TOD, AI-TODv2, VisDrone2019, and SODA-D show state-of-the-art gains, particularly for very tiny objects, with substantial AP improvements over traditional IoU-based methods. The approach is architecture-agnostic for anchor-based detectors and comes with publicly available code, highlighting practical impact for robust tiny-object detection in diverse scenes.

Abstract

Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase the number of positive samples and have some fixed hyperparameters need to set. However, more positive samples may not necessarily lead to better detection results, in fact, excessive positive samples may lead to more false positives. In this paper, we introduce a simple but effective strategy named the Similarity Distance (SimD) to evaluate the similarity between bounding boxes. This proposed strategy not only considers both location and shape similarity but also learns hyperparameters adaptively, ensuring that it can adapt to different datasets and various object sizes in a dataset. Our approach can be simply applied in common anchor-based detectors in place of the IoU for label assignment and Non Maximum Suppression (NMS). Extensive experiments on four mainstream tiny object detection datasets demonstrate superior performance of our method, especially, 1.8 AP points and 4.1 AP points of very tiny higher than the state-of-the-art competitors on AI-TOD. Code is available at: \url{https://github.com/cszzshi/SimD}.
Paper Structure (15 sections, 5 equations, 4 figures, 6 tables)

This paper contains 15 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Comparison between traditional label assignment metrics and our SimD metric. The first row shows typical detection results achieved with these methods, and the second row presents diagrammatic sketches of these metrics. The $\Delta w$ and $\Delta h$ in SimD respectively represent the difference of width and height between anchor and ground truth. The green, blue and red boxes respectively denote true positive (TP), false positive (FP) and false negative (FN) predictions.
  • Figure 2: The processing flow of the SimD-based label assignment strategy. We first obtain the coordinates of the ground truth and anchors and then calculate the Similarity Distance (SimD) between the ground truth and each anchor. Subsequently, we follow the traditional label assignment strategy to obtain positive and negative samples in accordance with corresponding thresholds. For a ground truth that does not have any associated positive sample based on this strategy, we assign the anchor with the maximum SimD value as a positive sample, as long as this SimD value is larger than a minimum positive threshold.
  • Figure 3: Comparison of detection results on AI-TOD dataset between label assignment with the traditional IoU metric and SimD. The first row shows the results of Faster R-CNN with the IoU metric, and the second row is also based on Faster R-CNN but with the SimD metric. The green, blue and red boxes respectively denote true positive (TP), false positive (FP) and false negative (FN) predictions. The improvement achieved with our method is obvious.
  • Figure 4: Some typical detection results on VisDrone2019 val set, which contains both tiny and general-sized objects.