Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu; Xue Yang; Qingyun Li; Feipeng Da; Jifeng Dai; Yu Qiao; Junchi Yan

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, Junchi Yan

TL;DR

This work tackles oriented object detection under weak supervision by leveraging single-point annotations. It introduces Point2RBox, an end-to-end, one-stage detector that learns RBox regression through two key ideas: synthetic pattern knowledge combination, which places known-pattern boxes around labeled points to supervise sizing and orientation, and transform self-supervision, which enforces consistency of outputs under image transformations. The method is further strengthened by a tailored label assignment strategy and practical training techniques, achieving competitive results on DOTA, DIOR, and HRSC, including 41.05% AP50 on DOTA with a CSPNeXt backbone and strong performance relative to point-to-HBox-to-RBox baselines. This approach substantially reduces annotation cost for oriented detection while preserving end-to-end efficiency, potentially enabling broader deployment in aerial, text, and industrial settings.

Abstract

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labeled point on the image, we spread the object feature to synthetic visual patterns with known boxes to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

TL;DR

Abstract

Paper Structure (13 sections, 15 equations, 3 figures, 5 tables)

This paper contains 13 sections, 15 equations, 3 figures, 5 tables.

Introduction
Related Work
Method
Synthetic Pattern Knowledge Combination
Transform Self-supervision
Loss Functions
Label Assignment
Inference Phase
Experiments
Settings and Datasets
Main Results
Ablation Studies
Conclusion

Figures (3)

Figure 1: Visual detection results based on the same ResNet50 He2016Deep backbone. The first row compares our method (Point2RBox-SK, AP$_{50}$ = 40.27, see Table \ref{['tab:exp_dota']}) with Point-to-HBox-to-RBox pipeline powered by the state-of-the-art P2BNet (2022) chen2022pointtobox and H2RBox-v2 (2023) yu2023h2rboxv2. The second row displays the comparison with Point-to-Mask-to-RBox method Point2Mask-RBox li2023point2mask.
Figure 2: The training flowchart, consisting of synthetic pattern knowledge combination (Sec. \ref{['sec:skc']}) and transform self-supervision (Sec. \ref{['sec:tss']}). The core idea is to combine knowledge from synthetic patterns for size/angle estimation with that from annotated points for classification. The basic patterns are obtained based on two different settings (see Fig. \ref{['fig:settings']}).
Figure 3: Two settings of obtaining basic patterns (see Sec. \ref{['sec:skc']}) and the illustration of training images overlaid with synthetic patterns. SetRC: Rectangles and circles with curve textures. SetSK: One simple sketch pattern for each category (see Table \ref{['tab:abl_pattern']} for ablation).

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

TL;DR

Abstract

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (3)