Table of Contents
Fetching ...

SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

TL;DR

Infrared ship detection in complex scenes is hampered by false alarms and weak target semantics. The authors propose SMPISD-MTPNet, a three-stage framework that injects scene semantics via a Scene Semantic Extractor, processes features with a CSPDarknet-53 backbone and a simplified FPN, and performs detection with a Multi-Task Perception Module that includes a Scene Segmentation head and a Gradient-based Module. They also introduce Soft Fine-tuning to mitigate augmentation distortion and IRSDSS, a new dataset with scene masks, to enable scene-aware learning. Experiments on IRSDSS show consistent improvements over state-of-the-art methods across multiple AP metrics, with a smaller model footprint. The work advances robust infrared ship detection by incorporating scene priors and gradient-guided cues to suppress false alarms and recover small/dim targets, expanding the practical applicability of IRSD systems.

Abstract

Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature extraction, and prediction. In the scene semantic extraction stage, we employ a Scene Semantic Extractor (SSE) to guide the network by the features extracted based on expert knowledge. In the deep feature extraction stage, a backbone network is employed to extract deep features. These features are subsequently integrated by a fusion network, enhancing the detection capabilities across targets of varying sizes. In the prediction stage, we utilize the Multi-Task Perception Module, which includes the Gradient-based Module and the Scene Segmentation Module, enabling precise detection of small and dim targets within complex scenes. For the training process, we introduce the Soft Fine-tuning training strategy to suppress the distortion caused by data augmentation. Besides, due to the lack of a publicly available dataset labelled for scenes, we introduce the Infrared Ship Dataset with Scene Segmentation (IRSDSS). Finally, we evaluate the network and compare it with state-of-the-art (SOTA) methods, indicating that SMPISD-MTPNet outperforms existing approaches. The source code and dataset for this research can be accessed at https://github.com/greekinRoma/KMNDNet.

SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

TL;DR

Infrared ship detection in complex scenes is hampered by false alarms and weak target semantics. The authors propose SMPISD-MTPNet, a three-stage framework that injects scene semantics via a Scene Semantic Extractor, processes features with a CSPDarknet-53 backbone and a simplified FPN, and performs detection with a Multi-Task Perception Module that includes a Scene Segmentation head and a Gradient-based Module. They also introduce Soft Fine-tuning to mitigate augmentation distortion and IRSDSS, a new dataset with scene masks, to enable scene-aware learning. Experiments on IRSDSS show consistent improvements over state-of-the-art methods across multiple AP metrics, with a smaller model footprint. The work advances robust infrared ship detection by incorporating scene priors and gradient-guided cues to suppress false alarms and recover small/dim targets, expanding the practical applicability of IRSD systems.

Abstract

Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature extraction, and prediction. In the scene semantic extraction stage, we employ a Scene Semantic Extractor (SSE) to guide the network by the features extracted based on expert knowledge. In the deep feature extraction stage, a backbone network is employed to extract deep features. These features are subsequently integrated by a fusion network, enhancing the detection capabilities across targets of varying sizes. In the prediction stage, we utilize the Multi-Task Perception Module, which includes the Gradient-based Module and the Scene Segmentation Module, enabling precise detection of small and dim targets within complex scenes. For the training process, we introduce the Soft Fine-tuning training strategy to suppress the distortion caused by data augmentation. Besides, due to the lack of a publicly available dataset labelled for scenes, we introduce the Infrared Ship Dataset with Scene Segmentation (IRSDSS). Finally, we evaluate the network and compare it with state-of-the-art (SOTA) methods, indicating that SMPISD-MTPNet outperforms existing approaches. The source code and dataset for this research can be accessed at https://github.com/greekinRoma/KMNDNet.
Paper Structure (27 sections, 8 equations, 16 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 8 equations, 16 figures, 6 tables, 1 algorithm.

Figures (16)

  • Figure 1: Infrared image and corresponding mask image: (a) Infrared image; (b) In the mask image corresponding to the infrared image, blue indicates land masks and green denotes cloud masks.
  • Figure 2: Statistics of instances in IRSDSS: (a) Instance Area, (b) Height, and (c) Aspect Ratio.
  • Figure 3: Diversity of scenery and weather: (a) ship with trails; (b) inshore and offshore scenes; (c) diverse clouds; and (d) sea wave.
  • Figure 4: Statistics of nearshore and offshore ships are in each image.
  • Figure 5: The SMPISD-MTPNet comprises three modules: 1) The SSE extracts scene semantics to enrich the inputs; 2) The CSPDarkNet-53 further processes these enhanced inputs; 3) The Multi-Task Perception Module analyzes the outputs of the CSPDarkNet-53 and predicts the result.
  • ...and 11 more figures