Linear Gaussian Bounding Box Representation and Ring-Shaped Rotated Convolution for Oriented Object Detection
Zhen Zhou, Yunkai Ma, Junfeng Fan, Zhaoyang Liu, Fengshui Jing, Min Tan
TL;DR
The paper tackles boundary discontinuity and numerical instability in oriented object detection by introducing Linear Gaussian Bounding Box (LGBB), which linearizes Gaussian bounding box parameters to stabilize regression, and Ring-Shaped Rotated Convolution (RRC), which adaptively rotates feature maps to extract rotation-sensitive information under a ring-shaped receptive field. LGBB avoids discontinuities by mapping OBBs into a linearly transformed Gaussian representation and adds a positive definite constraint to ensure covariance validity, while RRC accelerates the aggregation of rotation-aware features across orientations with modular components like AGM, PEM, and RFEM. The approach achieves state-of-the-art results on DOTA-v1.5 and HRSC2016, and is demonstrated to be plug-and-play across detectors like YOLOv7 and mmrotate-based models, improving accuracy with manageable speed cost. These contributions offer a practical and robust path to more accurate and efficient oriented object detection in aerial and maritime imagery, with potential for broader rotation-sensitive visual tasks.
Abstract
In oriented object detection, current representations of oriented bounding boxes (OBBs) often suffer from boundary discontinuity problem. Methods of designing continuous regression losses do not essentially solve this problem. Although Gaussian bounding box (GBB) representation avoids this problem, directly regressing GBB is susceptible to numerical instability. We propose linear GBB (LGBB), a novel OBB representation. By linearly transforming the elements of GBB, LGBB avoids the boundary discontinuity problem and has high numerical stability. In addition, existing convolution-based rotation-sensitive feature extraction methods only have local receptive fields, resulting in slow feature aggregation. We propose ring-shaped rotated convolution (RRC), which adaptively rotates feature maps to arbitrary orientations to extract rotation-sensitive features under a ring-shaped receptive field, rapidly aggregating features and contextual information. Experimental results demonstrate that LGBB and RRC achieve state-of-the-art performance. Furthermore, integrating LGBB and RRC into various models effectively improves detection accuracy.
