Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, Junchi Yan
TL;DR
This work tackles oriented object detection under weak supervision by leveraging single-point annotations. It introduces Point2RBox, an end-to-end, one-stage detector that learns RBox regression through two key ideas: synthetic pattern knowledge combination, which places known-pattern boxes around labeled points to supervise sizing and orientation, and transform self-supervision, which enforces consistency of outputs under image transformations. The method is further strengthened by a tailored label assignment strategy and practical training techniques, achieving competitive results on DOTA, DIOR, and HRSC, including 41.05% AP50 on DOTA with a CSPNeXt backbone and strong performance relative to point-to-HBox-to-RBox baselines. This approach substantially reduces annotation cost for oriented detection while preserving end-to-end efficiency, potentially enabling broader deployment in aerial, text, and industrial settings.
Abstract
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labeled point on the image, we spread the object feature to synthetic visual patterns with known boxes to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.
