Tiny Object Detection with Single Point Supervision
Haoran Zhu, Chang Xu, Ruixiang Zhang, Fang Xu, Wen Yang, Haijian Zhang, Gui-Song Xia
TL;DR
This work tackles the high-cost challenge of bounding-box annotation for tiny objects by enabling end-to-end detection with single-point supervision. It introduces Point Teacher, a two-phase denoising framework that converts noisy point annotations into accurate pseudo boxes through Spatial-aware Box Generation and Noise-aware Label Evolution, aided by Dynamic Multiple Instance Learning and a Jittering IoU Loss. The method uses a teacher-student setup and a top-K point-matching mechanism to progressively refine boxes, and it integrates with detectors via a Top-down FPN Aggregation and Scale-invariant Label Assignment, supporting both horizontal and oriented boxes. Experiments on AI-TOD-v2.0, SODA-A, and TinyPerson show substantial gains over existing PSOD approaches and competitive performance with box-supervised methods, highlighting robustness to point-location noise and potential for large-scale, annotation-efficient tiny object detection in aerial imagery.
Abstract
Tiny objects, with their limited spatial resolution, often resemble point-like distributions. As a result, bounding box prediction using point-level supervision emerges as a natural and cost-effective alternative to traditional box-level supervision. However, the small scale and lack of distinctive features of tiny objects make point annotations prone to noise, posing significant hurdles for model robustness. To tackle these challenges, we propose Point Teacher--the first end-to-end point-supervised method for robust tiny object detection in aerial images. To handle label noise from scale ambiguity and location shifts in point annotations, Point Teacher employs the teacher-student architecture and decouples the learning into a two-phase denoising process. In this framework, the teacher network progressively denoises the pseudo boxes derived from noisy point annotations, guiding the student network's learning. Specifically, in the first phase, random masking of image regions facilitates regression learning, enabling the teacher to transform noisy point annotations into coarse pseudo boxes. In the second phase, these coarse pseudo boxes are refined using dynamic multiple instance learning, which adaptively selects the most reliable instance from dynamically constructed proposal bags around the coarse pseudo boxes. Extensive experiments on three tiny object datasets (i.e., AI-TOD-v2, SODA-A, and TinyPerson) validate the proposed method's effectiveness and robustness against point location shifts. Notably, relying solely on point supervision, our Point Teacher already shows comparable performance with box-supervised learning methods. Codes and models will be made publicly available.
