Table of Contents
Fetching ...

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

Zi-Kai Xiao, Guo-Ye Yang, Xue Yang, Tai-Jiang Mu, Junchi Yan, Shi-min Hu

TL;DR

This work tackles the fundamental issue of discontinuity in oriented bounding box representations for OOD. It introduces COBB, a theoretically continuous OBB representation that encodes an OBB with nine continuous parameters derived from the outer HBB and OBB area, using a sliding ratio and IoU-based disambiguation to avoid decoding ambiguity. A modular JDet-based benchmark enables fair, reproducible comparisons across methods and datasets, with empirical results showing gains in high-precision detection (notably mAP75) over strong baselines on DOTA and related datasets. The approach provides formal continuity guarantees for both encoding and decoding, and demonstrates practical impact by improving cross-model performance without special tricks, while also outlining avenues for future integration with rotation-equivariant detectors.

Abstract

Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Prior studies typically can only address one of the two cases of discontinuity: rotation and aspect ratio, and often inadvertently introduce decoding discontinuity, e.g. Decoding Incompleteness (DI) and Decoding Ambiguity (DA) as discussed in literature. Specifically, we propose a novel representation method called Continuous OBB (COBB), which can be readily integrated into existing detectors e.g. Faster-RCNN as a plugin. It can theoretically ensure continuity in bounding box regression which to our best knowledge, has not been achieved in literature for rectangle-based object representation. For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation. On the popular DOTA dataset, by integrating Faster-RCNN as the same baseline model, our new method outperforms the peer method Gliding Vertex by 1.13% mAP50 (relative improvement 1.54%), and 2.46% mAP75 (relative improvement 5.91%), without any tricks.

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

TL;DR

This work tackles the fundamental issue of discontinuity in oriented bounding box representations for OOD. It introduces COBB, a theoretically continuous OBB representation that encodes an OBB with nine continuous parameters derived from the outer HBB and OBB area, using a sliding ratio and IoU-based disambiguation to avoid decoding ambiguity. A modular JDet-based benchmark enables fair, reproducible comparisons across methods and datasets, with empirical results showing gains in high-precision detection (notably mAP75) over strong baselines on DOTA and related datasets. The approach provides formal continuity guarantees for both encoding and decoding, and demonstrates practical impact by improving cross-model performance without special tricks, while also outlining avenues for future integration with rotation-equivariant detectors.

Abstract

Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Prior studies typically can only address one of the two cases of discontinuity: rotation and aspect ratio, and often inadvertently introduce decoding discontinuity, e.g. Decoding Incompleteness (DI) and Decoding Ambiguity (DA) as discussed in literature. Specifically, we propose a novel representation method called Continuous OBB (COBB), which can be readily integrated into existing detectors e.g. Faster-RCNN as a plugin. It can theoretically ensure continuity in bounding box regression which to our best knowledge, has not been achieved in literature for rectangle-based object representation. For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation. On the popular DOTA dataset, by integrating Faster-RCNN as the same baseline model, our new method outperforms the peer method Gliding Vertex by 1.13% mAP50 (relative improvement 1.54%), and 2.46% mAP75 (relative improvement 5.91%), without any tricks.
Paper Structure (36 sections, 26 equations, 8 figures, 12 tables)

This paper contains 36 sections, 26 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Examples of Discontinuity in OBB Representations. (a) Acute-angle Representation limits the rotation angle of OBBs inside a range of $\frac{\pi}{2}$ ($[-\frac{\pi}{4},\frac{\pi}{4})$ in this example). The red $\text{OBB}_1$ and the blue $\text{OBB}_2$ are similar, but their representations are significantly different. (b) Long-edge Representation determines the rotation angle $\theta$ by the long side and the x-axis. A slight disturbance in the aspect ratio of square-like OBBs will cause a huge change in their representation, which causes Aspect Ratio Discontinuity. (c) CSL yang2020arbitrary divides the rotation angle into several classifications (6 classifications in this figure). OBB between two classifications cannot be accurately represented, which brings DI. (d) GWD yang2021rethinking denotes OBBs by Gaussian distribution. As the squares with different rotation angles can correspond to the same Gaussian, the orientation of decoded squares will be ambiguous.
  • Figure 2: Example of COBB. COBB utilizes the outer HBB ($x_c, y_c, w, h$), sliding ratio $r_s$, and four IoU scores. (a) Example of the outer HBB and $r_s$. In this instance, $r_s=\frac{y_2-y_1}{h}$ when $w>h$, where $y_1$ and $y_2$ denote the two smaller y-coordinates among the four vertices of the OBB. (b) Using $x_c$, $y_c$, $w$, $h$, and $r_s$, along with the properties of similar triangles, we can derive and solve a system of equations to obtain the parameters for four OBBs (details provided in the supplemental material). Distinguishing between these OBBs is guided by the positional relationship between their vertices and the midpoints on each side.
  • Figure 3: Visual results of KLD yang2021learning and ours. Due to DA, KLD struggles to accurately predict the orientation of square-like objects. In contrast, our COBB circumvents DA, enhancing its precision in predicting the orientation of square-like objects.
  • Figure 4: An OBB with Its outer HBB.
  • Figure 5: COBB for Oriented Proposals.
  • ...and 3 more figures