Table of Contents
Fetching ...

Real-Time Oriented Object Detection Transformer in Remote Sensing Images

Zeyu Ding, Yong Zhou, Jiaqi Zhao, Wen-Liang Du, Xixi Li, Rui Yao, Abdulmotaleb El Saddik

Abstract

Recent real-time detection transformers have gained popularity due to their simplicity and efficiency. However, these detectors do not explicitly model object rotation, especially in remote sensing imagery where objects appear at arbitrary angles, leading to challenges in angle representation, matching cost, and training stability. In this paper, we propose a real-time oriented object detection transformer, the first real-time end-to-end oriented object detector to the best of our knowledge, that addresses the above issues. Specifically, angle distribution refinement is proposed to reformulate angle regression as an iterative refinement of probability distributions, thereby capturing the uncertainty of object rotation and providing a more fine-grained angle representation. Then, we incorporate a Chamfer distance cost into bipartite matching, measuring box distance via vertex sets, enabling more accurate geometric alignment and eliminating ambiguous matches. Moreover, we propose oriented contrastive denoising to stabilize training and analyze four noise modes. We observe that a ground truth can be assigned to different index queries across different decoder layers, and analyze this issue using the proposed instability metric. We design a series of model variants and experiments to validate the proposed method. Notably, our O2-DFINE-L, O2-RTDETR-R50 and O2-DEIM-R50 achieve 77.73%/78.45%/80.15% AP50 on DOTA1.0 and 132/119/119 FPS on the 2080ti GPU. Code is available at https://github.com/wokaikaixinxin/ai4rs.

Real-Time Oriented Object Detection Transformer in Remote Sensing Images

Abstract

Recent real-time detection transformers have gained popularity due to their simplicity and efficiency. However, these detectors do not explicitly model object rotation, especially in remote sensing imagery where objects appear at arbitrary angles, leading to challenges in angle representation, matching cost, and training stability. In this paper, we propose a real-time oriented object detection transformer, the first real-time end-to-end oriented object detector to the best of our knowledge, that addresses the above issues. Specifically, angle distribution refinement is proposed to reformulate angle regression as an iterative refinement of probability distributions, thereby capturing the uncertainty of object rotation and providing a more fine-grained angle representation. Then, we incorporate a Chamfer distance cost into bipartite matching, measuring box distance via vertex sets, enabling more accurate geometric alignment and eliminating ambiguous matches. Moreover, we propose oriented contrastive denoising to stabilize training and analyze four noise modes. We observe that a ground truth can be assigned to different index queries across different decoder layers, and analyze this issue using the proposed instability metric. We design a series of model variants and experiments to validate the proposed method. Notably, our O2-DFINE-L, O2-RTDETR-R50 and O2-DEIM-R50 achieve 77.73%/78.45%/80.15% AP50 on DOTA1.0 and 132/119/119 FPS on the 2080ti GPU. Code is available at https://github.com/wokaikaixinxin/ai4rs.
Paper Structure (18 sections, 18 equations, 14 figures, 10 tables, 1 algorithm)

This paper contains 18 sections, 18 equations, 14 figures, 10 tables, 1 algorithm.

Figures (14)

  • Figure 1: Compared to existing real-time object detectors, the family of O$^2$-DFINE, O$^2$-RTDETR, and O$^2$-DEIM achieves competitive performance.
  • Figure 2: The number of remaining oriented boxes above different confidence thresholds in rotated NMS.
  • Figure 3: Execution time of rotated NMS with different numbers of non-overlapping oriented boxes on the 2080ti (TensorRT FP16).
  • Figure 4: Overview of O$^2$-DFINE with Angle Distribution Refinement. Oriented boxes are learned in a decoupled manner using probability distributions as fine-grained representations of the external rectangle and vertex offsets, which are iteratively refined across decoder layers in a residual fashion.
  • Figure 5: Distance cost for bipartite matching. (a) Chamfer distance (ours). (b) Kullback-Leibler Divergence. (c) L1 distance. (d) Hausdorff distance.
  • ...and 9 more figures