Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Xinbin Yuan; Zhaohui Zheng; Yuxuan Li; Xialei Liu; Li Liu; Xiang Li; Qibin Hou; Ming-Ming Cheng

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Xinbin Yuan, Zhaohui Zheng, Yuxuan Li, Xialei Liu, Li Liu, Xiang Li, Qibin Hou, Ming-Ming Cheng

TL;DR

Strip R-CNN tackles the challenge of detecting slender, high-aspect-ratio objects in remote sensing imagery by introducing large orthogonal strip convolutions in a lightweight StripNet backbone and a dedicated Strip Head. The method decouples localization from classification and angle prediction, leveraging long-range directional features to improve precision for rotated boxes. It achieves state-of-the-art results on DOTA-v1.0 (82.75% mAP with ensembling) and strong performance on FAIR1M, HRSC2016, and DIOR, while maintaining a simple, efficient design. The findings highlight strip convolutions as a practical and effective alternative to large-square kernels for remote sensing object detection.

Abstract

While witnessed with rapid development, remote sensing object detection remains challenging for detecting high aspect ratio objects. This paper shows that large strip convolutions are good feature representation learners for remote sensing object detection and can detect objects of various aspect ratios well. Based on large strip convolutions, we build a new network architecture called Strip R-CNN, which is simple, efficient, and powerful. Unlike recent remote sensing object detectors that leverage large-kernel convolutions with square shapes, our Strip R-CNN takes advantage of sequential orthogonal large strip convolutions in our backbone network StripNet to capture spatial information. In addition, we improve the localization capability of remote-sensing object detectors by decoupling the detection heads and equipping the localization branch with strip convolutions in our strip head. Extensive experiments on several benchmarks, for example DOTA, FAIR1M, HRSC2016, and DIOR, show that our Strip R-CNN can greatly improve previous work. In particular, our 30M model achieves 82.75% mAP on DOTA-v1.0, setting a new state-of-the-art record. Our code will be made publicly available.Code is available at https://github.com/YXB-NKU/Strip-R-CNN.

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 10 figures, 18 tables)

This paper contains 20 sections, 1 equation, 10 figures, 18 tables.

Introduction
Related Work
Strip R-CNN
Overall Architecture
Strip Module
Detection Head with Strip Convolutions
Experiments
Experiment Setup
Main Results
Ablation Studies
Visual Analysis
Conclusions
Dataset details
Implementation details
FAIR1M benchmark results
...and 5 more sections

Figures (10)

Figure 1: Statistics of the DOTA dataset dota_set and the detection performance of several recent state-of-the-art detectors. We can see that slender objects (aspect ratio $>$ 3) occupy a non-negligible proportion and detection performance of previous state-of-the-art models declines as aspect ratio increases.
Figure 2: A comprehensive comparison of detection performance on the DOTA dataset of various remote sensing object detectors.
Figure 3: Structural comparison between our proposed strip module and other representative methods using large-kernel convolutions, including LSKNet lsknet and PKINet cai2024poly.
Figure 4: Structure of our basic block of StripNet backbone.
Figure 5: Structural comparison of Oriented R-CNN head and strip head. In our strip head, the classification and angle prediction heads share two fully connected layer, while the localization head incorporates our strip module.
...and 5 more figures

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

TL;DR

Abstract

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)