Table of Contents
Fetching ...

Better Sampling, towards Better End-to-end Small Object Detection

Zile Huang, Chong Zhang, Mingyu Jin, Fangyu Wu, Chengzhi Liu, Xiaobo Jin

TL;DR

This work targets small object detection within end-to-end transformer-based detectors. It introduces three sampling-centric techniques—Sample Points Refinement (SPR), Scale-aligned Target (ST), and task-decoupled Sample Reweighting (SR)—to improve localization, classification, and training emphasis. Empirical results on VisDrone and SODA-D show consistent AP gains over state-of-the-art end-to-end detectors, validating the effectiveness of refined sampling and scale-aware confidence estimation for tiny objects. The proposed approach preserves inference speed while delivering notable improvements in challenging dense scenes with overlapping small targets.

Abstract

While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not leverage the gap between accuracy and inference speed. To address challenges, we propose methods enhancing sampling within an end-to-end framework. Sample Points Refinement (SPR) constrains localization and attention, preserving meaningful interactions in the region of interest and filtering out misleading information. Scale-aligned Target (ST) integrates scale information into target confidence, improving classification for small object detection. A task-decoupled Sample Reweighting (SR) mechanism guides attention toward challenging positive examples, utilizing a weight generator module to assess the difficulty and adjust classification loss based on decoder layer outcomes. Comprehensive experiments across various benchmarks reveal that our proposed detector excels in detecting small objects. Our model demonstrates a significant enhancement, achieving a 2.9\% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset and a 1.7\% improvement on the SODA-D dataset.

Better Sampling, towards Better End-to-end Small Object Detection

TL;DR

This work targets small object detection within end-to-end transformer-based detectors. It introduces three sampling-centric techniques—Sample Points Refinement (SPR), Scale-aligned Target (ST), and task-decoupled Sample Reweighting (SR)—to improve localization, classification, and training emphasis. Empirical results on VisDrone and SODA-D show consistent AP gains over state-of-the-art end-to-end detectors, validating the effectiveness of refined sampling and scale-aware confidence estimation for tiny objects. The proposed approach preserves inference speed while delivering notable improvements in challenging dense scenes with overlapping small targets.

Abstract

While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not leverage the gap between accuracy and inference speed. To address challenges, we propose methods enhancing sampling within an end-to-end framework. Sample Points Refinement (SPR) constrains localization and attention, preserving meaningful interactions in the region of interest and filtering out misleading information. Scale-aligned Target (ST) integrates scale information into target confidence, improving classification for small object detection. A task-decoupled Sample Reweighting (SR) mechanism guides attention toward challenging positive examples, utilizing a weight generator module to assess the difficulty and adjust classification loss based on decoder layer outcomes. Comprehensive experiments across various benchmarks reveal that our proposed detector excels in detecting small objects. Our model demonstrates a significant enhancement, achieving a 2.9\% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset and a 1.7\% improvement on the SODA-D dataset.
Paper Structure (21 sections, 15 equations, 5 figures, 8 tables)

This paper contains 21 sections, 15 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Visualization of our methods within challenging scenes featuring crowds and overlapping objects. Our visual illustration emphasizes the effectiveness of scale-aligned target, sample reweighting, and sample point refinement within intricate scenes, allowing for a comparative analysis with a standard or baseline method.
  • Figure 2: Framework overall. We proposed a series of transferable methods to support end-to-end models to perform better in small object detection. The final loss could be represented as $\mathcal{L} = \mathcal{L} ^{ \textrm{cls} } + \mathcal{L} ^{ \textrm{reg} } + \mathcal{L} ^{ \textrm{offset} } + \mathcal{L} ^{ \textrm{atten} }$.
  • Figure 3: The IoU values on the two target detection tasks are equal, but the right side has a larger area ratio (predicted box area and true box area).
  • Figure 4: Illustration for the weight generator in the reweighting module. The kernel size of Conv$_{1}$, Conv$_{2}$, and Conv$_{3}$ are $C\times2C\times1\times1$, $C\times C\times1\times1$, and $1\times C\times1\times1$, respectively.
  • Figure 5: Comparison of heatmap visualizations between the baseline and our method.