Table of Contents
Fetching ...

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo, Shaofeng Zhang, Feipeng Da, Junchi Yan, Xue Yang

TL;DR

Point2RBox-v2 tackles point-supervised oriented object detection by exploiting the spatial layout among instances. It introduces a pair of layout-based losses—Gaussian overlap and Voronoi watershed—along with a consistency loss, edge guidance, and copy-paste augmentation to tightly constrain object size and orientation without heavy priors. Across diverse remote-sensing and retail datasets, the method achieves state-of-the-art results in end-to-end and pseudo-label settings, notably surpassing prior point-supervised approaches in dense scenes and approaching RBox-supervised performance in several benchmarks. The approach remains robust to annotation noise and does not require pre-trained priors, offering a practical, lightweight path toward accurate oriented detection from points, with clear limitations on sparse categories that lack layout cues.

Abstract

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging task setting with the layout among instances and present Point2RBox-v2. At the core are three principles: 1) Gaussian overlap loss. It learns an upper bound for each instance by treating objects as 2D Gaussian distributions and minimizing their overlap. 2) Voronoi watershed loss. It learns a lower bound for each instance through watershed on Voronoi tessellation. 3) Consistency loss. It learns the size/rotation variation between two output sets with respect to an input image and its augmented view. Supplemented by a few devised techniques, e.g. edge loss and copy-paste, the detector is further enhanced. To our best knowledge, Point2RBox-v2 is the first approach to explore the spatial layout among instances for learning point-supervised OOD. Our solution is elegant and lightweight, yet it is expected to give a competitive performance especially in densely packed scenes: 62.61%/86.15%/34.71% on DOTA/HRSC/FAIR1M. Code is available at https://github.com/VisionXLab/point2rbox-v2.

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

TL;DR

Point2RBox-v2 tackles point-supervised oriented object detection by exploiting the spatial layout among instances. It introduces a pair of layout-based losses—Gaussian overlap and Voronoi watershed—along with a consistency loss, edge guidance, and copy-paste augmentation to tightly constrain object size and orientation without heavy priors. Across diverse remote-sensing and retail datasets, the method achieves state-of-the-art results in end-to-end and pseudo-label settings, notably surpassing prior point-supervised approaches in dense scenes and approaching RBox-supervised performance in several benchmarks. The approach remains robust to annotation noise and does not require pre-trained priors, offering a practical, lightweight path toward accurate oriented detection from points, with clear limitations on sparse categories that lack layout cues.

Abstract

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging task setting with the layout among instances and present Point2RBox-v2. At the core are three principles: 1) Gaussian overlap loss. It learns an upper bound for each instance by treating objects as 2D Gaussian distributions and minimizing their overlap. 2) Voronoi watershed loss. It learns a lower bound for each instance through watershed on Voronoi tessellation. 3) Consistency loss. It learns the size/rotation variation between two output sets with respect to an input image and its augmented view. Supplemented by a few devised techniques, e.g. edge loss and copy-paste, the detector is further enhanced. To our best knowledge, Point2RBox-v2 is the first approach to explore the spatial layout among instances for learning point-supervised OOD. Our solution is elegant and lightweight, yet it is expected to give a competitive performance especially in densely packed scenes: 62.61%/86.15%/34.71% on DOTA/HRSC/FAIR1M. Code is available at https://github.com/VisionXLab/point2rbox-v2.

Paper Structure

This paper contains 19 sections, 21 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Related methods, their principles for knowledge mining, whether using additional priors, and performance on DOTA-v1.0.
  • Figure 2: Visual comparisons with state-of-the-art methods including: Point2Mask (2023) li2023point2mask, PointOBB (2024) luo2024pointobb, PointOBB-v2 (2025) ren2024pointobbv2, PointOBB-v3 (2025) zhang2025pointobbv3, and Point2RBox (2024) yu2024point2rbox. The boxes detected by our method (last row) wrap the objects more tightly.
  • Figure 3: The training pipeline of Point2RBox-v2. Gaussian overlap loss and Voronoi watershed loss utilize the spatial layout (see Fig. \ref{['fig:loss']}), while edge loss (see Sec. \ref{['sec:method-le']}), symmetry-aware learning (see Sec. \ref{['sec:method-lss']}), and copy-paste (see Sec. \ref{['sec:method-cp']}) further enhance the method.
  • Figure 4: To illustrate the procedure of the three newly proposed loss functions and their impact on the learning results. (a) Gaussian overlap loss (see Sec. \ref{['sec:method-lo']}). (b) Voronoi watershed loss (see Sec. \ref{['sec:method-lw']}). (c) Edge loss (see Sec. \ref{['sec:method-le']}).
  • Figure 5: Qualitative analysis on failed cases and overlap cases.