Table of Contents
Fetching ...

Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

Haoran Zhu, Wen Yang, Guangyou Yang, Chang Xu, Ruixiang Zhang, Fang Xu, Haijian Zhang, Gui-Song Xia

Abstract

Small object detection (SOD) remains challenging due to extremely limited pixels and ambiguous object boundaries. These characteristics lead to challenging annotation, limited availability of large-scale high-quality datasets, and inherently weak semantic representations for small objects. In this work, we first address the data limitation by introducing TinySet-9M, the first large-scale, multi-domain dataset for small object detection. Beyond filling the gap in large-scale datasets, we establish a benchmark to evaluate the effectiveness of existing label-efficient detection methods for small objects. Our evaluation reveals that weak visual cues further exacerbate the performance degradation of label-efficient methods in small object detection, highlighting a critical challenge in label-efficient SOD. Secondly, to tackle the limitation of insufficient semantic representation, we move beyond training-time feature enhancement and propose a new paradigm termed Point-Prompt Small Object Detection (P2SOD). This paradigm introduces sparse point prompts at inference time as an efficient information bridge for category-level localization, enabling semantic augmentation. Building upon the P2SOD paradigm and the large-scale TinySet-9M dataset, we further develop DEAL (DEtect Any smalL object), a scalable and transferable point-prompted detection framework that learns robust, prompt-conditioned representations from large-scale data. With only a single click at inference time, DEAL achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) on TinySet-9M, while generalizing effectively to unseen categories and unseen datasets. Our project is available at https://zhuhaoraneis.github.io/TinySet-9M/.

Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

Abstract

Small object detection (SOD) remains challenging due to extremely limited pixels and ambiguous object boundaries. These characteristics lead to challenging annotation, limited availability of large-scale high-quality datasets, and inherently weak semantic representations for small objects. In this work, we first address the data limitation by introducing TinySet-9M, the first large-scale, multi-domain dataset for small object detection. Beyond filling the gap in large-scale datasets, we establish a benchmark to evaluate the effectiveness of existing label-efficient detection methods for small objects. Our evaluation reveals that weak visual cues further exacerbate the performance degradation of label-efficient methods in small object detection, highlighting a critical challenge in label-efficient SOD. Secondly, to tackle the limitation of insufficient semantic representation, we move beyond training-time feature enhancement and propose a new paradigm termed Point-Prompt Small Object Detection (P2SOD). This paradigm introduces sparse point prompts at inference time as an efficient information bridge for category-level localization, enabling semantic augmentation. Building upon the P2SOD paradigm and the large-scale TinySet-9M dataset, we further develop DEAL (DEtect Any smalL object), a scalable and transferable point-prompted detection framework that learns robust, prompt-conditioned representations from large-scale data. With only a single click at inference time, DEAL achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) on TinySet-9M, while generalizing effectively to unseen categories and unseen datasets. Our project is available at https://zhuhaoraneis.github.io/TinySet-9M/.

Paper Structure

This paper contains 19 sections, 13 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of our study on generalized small object detection. Leveraging the proposed TinySet-9M dataset and benchmark, we systematically investigate the performance of existing label-efficient paradigms in the small-object regime and introduce a new detection paradigm, Point-prompt Small Object Detection (P$^2$SOD). The middle panel illustrates the domain composition of TinySet-9M, while the right panel compares the performance of representative label-efficient paradigms on small objects and our proposed detection paradigm.
  • Figure 2: Statistical analysis of the TinySet-9M dataset. (a) shows the composition of TinySet-9M, where different source datasets form distinct sub-domains; (b) presents the distribution of object scales, in which the green line indicates the average object scale of the dataset (20.4) and the red line denotes the scale threshold for small objects; (c) illustrates the distribution of object density across images; (d) shows the comparison of TinySet-9M and Object365 in terms of the proportion of small objects, object density, average object scale, number of instances, and number of images.
  • Figure 3: Visualization of the TinySet-9M dataset. The dataset contains a large number of densely distributed objects with extremely small spatial scales, which results in blurred object boundaries and weak semantic representations.
  • Figure 4: Detection results of different detection paradigms on TinySet-9M dataset. Orange boxes denote the detection results with confidence higher than 0.2.
  • Figure 5: Overview of the proposed method. (a) Illustration of the overall architecture of DEAL, which is built upon the RT-DETR framework and extended to support point-prompted small object detection. (b) Illustration of the training pipeline of the proposed Prediction-Guided Cyclic Point Prompting (PG-CPP) strategy.
  • ...and 5 more figures