Table of Contents
Fetching ...

Enhancing, Refining, and Fusing: Towards Robust Multi-Scale and Dense Ship Detection

Congxia Zhao, Xiongjun Fu, Jian Dong, Shen Cao, Chunyan Zhang

TL;DR

This work tackles the problem of robust ship detection in synthetic aperture radar images, which are challenged by clutter, densely packed targets, and large scale variation. It introduces CASS-Det, a center-aware one-stage detector comprising three modules: Center Enhancement Module (CEM) with rotational convolution to emphasize ship centers, Neighbor Attention Module (NAM) to refine boundaries through cross-layer dependencies, and Cross-Connected Feature Pyramid Network (CC-FPN) for enhanced multi-scale fusion. The method, integrated with a CSPDarknet backbone and YOLOX head, achieves state-of-the-art performance on SSDD, HRSID, and LS-SSDD, with notable gains in small and densely packed ship detections. The approach holds strong practical value for maritime surveillance and SAR image analysis, and demonstrates how targeted architectural components can mitigate SAR-specific challenges in real-world scenarios.

Abstract

Synthetic aperture radar (SAR) imaging, celebrated for its high resolution, all-weather capability, and day-night operability, is indispensable for maritime applications. However, ship detection in SAR imagery faces significant challenges, including complex backgrounds, densely arranged targets, and large scale variations. To address these issues, we propose a novel framework, Center-Aware SAR Ship Detector (CASS-Det), designed for robust multi-scale and densely packed ship detection. CASS-Det integrates three key innovations: (1) a center enhancement module (CEM) that employs rotational convolution to emphasize ship centers, improving localization while suppressing background interference; (2) a neighbor attention module (NAM) that leverages cross-layer dependencies to refine ship boundaries in densely populated scenes; and (3) a cross-connected feature pyramid network (CC-FPN) that enhances multi-scale feature fusion by integrating shallow and deep features. Extensive experiments on the SSDD, HRSID, and LS-SSDD-v1.0 datasets demonstrate the state-of-the-art performance of CASS-Det, excelling at detecting multi-scale and densely arranged ships.

Enhancing, Refining, and Fusing: Towards Robust Multi-Scale and Dense Ship Detection

TL;DR

This work tackles the problem of robust ship detection in synthetic aperture radar images, which are challenged by clutter, densely packed targets, and large scale variation. It introduces CASS-Det, a center-aware one-stage detector comprising three modules: Center Enhancement Module (CEM) with rotational convolution to emphasize ship centers, Neighbor Attention Module (NAM) to refine boundaries through cross-layer dependencies, and Cross-Connected Feature Pyramid Network (CC-FPN) for enhanced multi-scale fusion. The method, integrated with a CSPDarknet backbone and YOLOX head, achieves state-of-the-art performance on SSDD, HRSID, and LS-SSDD, with notable gains in small and densely packed ship detections. The approach holds strong practical value for maritime surveillance and SAR image analysis, and demonstrates how targeted architectural components can mitigate SAR-specific challenges in real-world scenarios.

Abstract

Synthetic aperture radar (SAR) imaging, celebrated for its high resolution, all-weather capability, and day-night operability, is indispensable for maritime applications. However, ship detection in SAR imagery faces significant challenges, including complex backgrounds, densely arranged targets, and large scale variations. To address these issues, we propose a novel framework, Center-Aware SAR Ship Detector (CASS-Det), designed for robust multi-scale and densely packed ship detection. CASS-Det integrates three key innovations: (1) a center enhancement module (CEM) that employs rotational convolution to emphasize ship centers, improving localization while suppressing background interference; (2) a neighbor attention module (NAM) that leverages cross-layer dependencies to refine ship boundaries in densely populated scenes; and (3) a cross-connected feature pyramid network (CC-FPN) that enhances multi-scale feature fusion by integrating shallow and deep features. Extensive experiments on the SSDD, HRSID, and LS-SSDD-v1.0 datasets demonstrate the state-of-the-art performance of CASS-Det, excelling at detecting multi-scale and densely arranged ships.
Paper Structure (18 sections, 10 equations, 10 figures, 5 tables)

This paper contains 18 sections, 10 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Images from SSDD and HRSID. The ground truth in the images are marked with green bounding boxes.
  • Figure 2: The structure of center enhancement module (CEM) and neighbor attention module (NAM). In (a), $C_i$($i = 2, 3, 4, 5$) presents the input. $f$ is the convolution kernel. $R(k)f$ represents the convolution kernel $(f)$ rotated with $k$. $n_1, n_2, n_3, n_4$ are the convolution results of $C_i$ and rotated kernels. In (b), long-range dependencies of the image are calculated based on the feature maps of two adjacent layers, and obtain the global features of different fine-grained synthesis. $F_j$ and $F_{j+1}$ are inputs of NAM. $F'_j$ and $F'_{j+1}$ reflect the the relationship between each pixel and the pixels in horizontal and vertical directions. With the recurrent neighbor cross attention module (NCA), $F"_j$ and $F"_{j+1}$ contain the global information.
  • Figure 3: Structure of FPN, PAFPN, and CC-FPN (Ours). Compared with FPN and PAFPN, CC-FPN introduces extra shallow and deep feature maps with maintaining computation amount to enhance detail and semantic information, offering a richer portrayal of both small and large ships.
  • Figure 4: Overall architecture of CASS-Det. CEM and NAM represent center enhancement module and neighbor attention module, respectively. CASS-Det consists of the backbone network, four center enhancement modules, two neighbor attention modules, cross-connected feature pyramid network (CC-FPN), and detection head. Firstly, we adopt CEM to highlight the central regions of feature maps ($C_i, (i=2,3,4,5)$) based on the rotational convolution. Secondly, NAM is designed to combine various fine-grained global features by calculating the long-range dependencies based on the adjacent layers. Thirdly, CC-FPN provides richer semantic and contextual information through cross-connected feature fusion structure.
  • Figure 5: Detection Results of Different Methods on SSDD. The green box is the ground truth. Red box is the detection result. The yellow circle indicates false detection, and the blue circle indicates missed detection.
  • ...and 5 more figures