Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai; Qiuxia Lai; Yuwei Wang; Wenguan Wang; Zeren Sun; Yazhou Yao

Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai, Qiuxia Lai, Yuwei Wang, Wenguan Wang, Zeren Sun, Yazhou Yao

TL;DR

The Poly Kernel Inception Network (PKINet) is introduced, which employs multi-scale convolution kernels without dilation to extract object features of varying scales and capture local context in remote sensing images.

Abstract

Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context. Prior methods tried to address these challenges by expanding the spatial receptive field of the backbone, either through large-kernel convolution or dilated convolution. However, the former typically introduces considerable background noise, while the latter risks generating overly sparse feature representations. In this paper, we introduce the Poly Kernel Inception Network (PKINet) to handle the above challenges. PKINet employs multi-scale convolution kernels without dilation to extract object features of varying scales and capture local context. In addition, a Context Anchor Attention (CAA) module is introduced in parallel to capture long-range contextual information. These two components work jointly to advance the performance of PKINet on four challenging remote sensing detection benchmarks, namely DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R.

Poly Kernel Inception Network for Remote Sensing Detection

TL;DR

Abstract

Paper Structure (14 sections, 9 equations, 3 figures, 9 tables)

This paper contains 14 sections, 9 equations, 3 figures, 9 tables.

Introduction
Related Work
Methodology
PKI Stage
PKI Module
Context Anchor Attention (CAA)
Implementation Details
Experiment
Experimental Setup
Quantitative Results
Qualitative Results
Diagnostic Experiments
Analysis
Discussion and Conclusion

Figures (3)

Figure 1: Top: Our approach yields solid performance gains over various remote sensing detectors tian2019fcosyang2021r3dethan2021alignxie2021orientedding2019learning with fewer parameters on DOTA-v1.0 xia2018dota. Bottom: Networks with small kernels miss long-range context in large object detection, whereas those with large kernels introduce noise for small objects. Our multi-scale convolution, however, handles scale variations well.
Figure 2: PKINet overview.(a) PKINet consists of four stages, where the spatial resolution of the $l$-th stage output is $(C_l\!\times\!H_l\!\times\!W_l)$. Each (b) Stage (§\ref{['sec:pki_stage']}) implies a cross-stage partial (CSP) structure, where the input is split in half along the channel dimension and fed to a Feed-Forward Network (FFN) and a sequence of $N_l$ PKI Blocks, respectively. Each (c) PKINet Block contains a (d) PKI Module (§\ref{['sec:pki_module']}) and a (e) CAA Module (§\ref{['sec:caa']}). Here, $n\!=\!0,\dots,N_l\!-\!1$ means that the PKI/CAA Module is in the $n$-th PKI Block of the $l$-th stage.
Figure 3: Visual results on DOTA-v1.0 datasetxia2018dota. Top: LSKNet Li_2023_ICCV; Bottom: our PKINet. See §\ref{['sec:qua_result']} for details.

Poly Kernel Inception Network for Remote Sensing Detection

TL;DR

Abstract

Poly Kernel Inception Network for Remote Sensing Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (3)