Table of Contents
Fetching ...

MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms

Jiahao Zhang, Xiao Zhao, Guangyu Gao

TL;DR

This work tackles the persistent challenge of small object detection in high-resolution remote sensing imagery by introducing MKSNet, which combines multi-kernel spatial feature extraction with a dual attention framework (spatial and channel attention). The approach leverages large convolutional kernels to capture rich contextual information across scales and uses a dual-attention fusion to suppress background clutter while preserving informative features. Empirical results on DOTA-v1.0 and HRSC2016 demonstrate state-of-the-art performance and faster convergence, with notable improvements over ResNet-50 baselines. The ablation study confirms that both SA and CA contribute significantly to performance, validating the effectiveness of the proposed multi-kernel selection and attention design in complex, high-resolution remote sensing data.

Abstract

Deep convolutional neural networks (DCNNs) have substantially advanced object detection capabilities, particularly in remote sensing imagery. However, challenges persist, especially in detecting small objects where the high resolution of these images and the small size of target objects often result in a loss of critical information in the deeper layers of conventional CNNs. Additionally, the extensive spatial redundancy and intricate background details typical in remote-sensing images tend to obscure these small targets. To address these challenges, we introduce Multi-Kernel Selection Network (MKSNet), a novel network architecture featuring a novel Multi-Kernel Selection mechanism. The MKS mechanism utilizes large convolutional kernels to effectively capture an extensive range of contextual information. This innovative design allows for adaptive kernel size selection, significantly enhancing the network's ability to dynamically process and emphasize crucial spatial details for small object detection. Furthermore, MKSNet also incorporates a dual attention mechanism, merging spatial and channel attention modules. The spatial attention module adaptively fine-tunes the spatial weights of feature maps, focusing more intensively on relevant regions while mitigating background noise. Simultaneously, the channel attention module optimizes channel information selection, improving feature representation and detection accuracy. Empirical evaluations on the DOTA-v1.0 and HRSC2016 benchmark demonstrate that MKSNet substantially surpasses existing state-of-the-art models in detecting small objects in remote sensing images. These results highlight MKSNet's superior ability to manage the complexities associated with multi-scale and high-resolution image data, confirming its effectiveness and innovation in remote sensing object detection.

MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms

TL;DR

This work tackles the persistent challenge of small object detection in high-resolution remote sensing imagery by introducing MKSNet, which combines multi-kernel spatial feature extraction with a dual attention framework (spatial and channel attention). The approach leverages large convolutional kernels to capture rich contextual information across scales and uses a dual-attention fusion to suppress background clutter while preserving informative features. Empirical results on DOTA-v1.0 and HRSC2016 demonstrate state-of-the-art performance and faster convergence, with notable improvements over ResNet-50 baselines. The ablation study confirms that both SA and CA contribute significantly to performance, validating the effectiveness of the proposed multi-kernel selection and attention design in complex, high-resolution remote sensing data.

Abstract

Deep convolutional neural networks (DCNNs) have substantially advanced object detection capabilities, particularly in remote sensing imagery. However, challenges persist, especially in detecting small objects where the high resolution of these images and the small size of target objects often result in a loss of critical information in the deeper layers of conventional CNNs. Additionally, the extensive spatial redundancy and intricate background details typical in remote-sensing images tend to obscure these small targets. To address these challenges, we introduce Multi-Kernel Selection Network (MKSNet), a novel network architecture featuring a novel Multi-Kernel Selection mechanism. The MKS mechanism utilizes large convolutional kernels to effectively capture an extensive range of contextual information. This innovative design allows for adaptive kernel size selection, significantly enhancing the network's ability to dynamically process and emphasize crucial spatial details for small object detection. Furthermore, MKSNet also incorporates a dual attention mechanism, merging spatial and channel attention modules. The spatial attention module adaptively fine-tunes the spatial weights of feature maps, focusing more intensively on relevant regions while mitigating background noise. Simultaneously, the channel attention module optimizes channel information selection, improving feature representation and detection accuracy. Empirical evaluations on the DOTA-v1.0 and HRSC2016 benchmark demonstrate that MKSNet substantially surpasses existing state-of-the-art models in detecting small objects in remote sensing images. These results highlight MKSNet's superior ability to manage the complexities associated with multi-scale and high-resolution image data, confirming its effectiveness and innovation in remote sensing object detection.

Paper Structure

This paper contains 24 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Roundabout Detection in DOTA-v1.0. On the left, traditional networks struggle to distinguish the roundabout from similar structures such as intersections and storage tanks, due to small convolutional kernels. The right image shows a heatmap from MKSNet, highlighting its superior ability to capture contextual details and accurately identify the roundabout.
  • Figure 2: Comparative Heatmaps and Detection Results for Roundabout Recognition. The first heatmap shows a network with fewer large kernels focusing on local features of the roundabout, while the second highlights a network with more large kernels capturing broader contextual information. The third image demonstrates MKSNet's accurate and rapid recognition of the roundabout, leveraging enhanced contextual understanding.
  • Figure 3: Overall framework of MKSNet. The MKSNet comprises a sequence of MKS blocks, with each block incorporating a Channel Attention module (at the top) and a Spatial Attention module (at the bottom). It starts with the input image being divided into patches via a convolutional layer. These patches undergo enhancement by the Channel Attention to emphasize significant channels, followed by the Spatial Attention focusing on key areas. The MKSNet dynamically selects various kernel sizes to capture and integrate multi-scale contextual information, significantly improving detection performance.
  • Figure 4: mAP comparison over 100 epochs between MKSNet and ResNet-50.