Table of Contents
Fetching ...

Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention

Xin Zuo, Jiaran Jiang, Jifeng Shen, Wankou Yang

TL;DR

This work tackles degraded underwater image quality that hampers semantic segmentation by introducing UWSegFormer, a transformer-based framework that incorporates Underwater Image Quality Attention (UIQA) to emphasize high-quality semantic channels, Multi-scale Aggregation Attention (MAA) to fuse multi-scale features by leveraging high-level context for low-level details, and Edge Learning Loss (ELL) to enforce sharper boundary learning. Built on a SegFormer-like architecture, UIQA, MAA, and ELL collectively improve segmentation completeness and boundary clarity, achieving state-of-the-art results on SUIM ($mIoU=82.12\%$) and DUT ($mIoU=71.41\%$) with reduced computational cost. The approach demonstrates strong generalization to different backbones and offers practical impact for underwater navigation and seabed exploration where lighting and scattering degrade image quality. Overall, the paper advances underwater semantic segmentation by coupling quality-aware channel attention with cross-scale feature aggregation and boundary-focused supervision in a Transformer-based framework.

Abstract

Underwater image understanding is crucial for both submarine navigation and seabed exploration. However, the low illumination in underwater environments degrades the imaging quality, which in turn seriously deteriorates the performance of underwater semantic segmentation, particularly for outlining the object region boundaries. To tackle this issue, we present UnderWater SegFormer (UWSegFormer), a transformer-based framework for semantic segmentation of low-quality underwater images. Firstly, we propose the Underwater Image Quality Attention (UIQA) module. This module enhances the representation of highquality semantic information in underwater image feature channels through a channel self-attention mechanism. In order to address the issue of loss of imaging details due to the underwater environment, the Multi-scale Aggregation Attention(MAA) module is proposed. This module aggregates sets of semantic features at different scales by extracting discriminative information from high-level features,thus compensating for the semantic loss of detail in underwater objects. Finally, during training, we introduce Edge Learning Loss (ELL) in order to enhance the model's learning of underwater object edges and improve the model's prediction accuracy. Experiments conducted on the SUIM and DUT-USEG (DUT) datasets have demonstrated that the proposed method has advantages in terms of segmentation completeness, boundary clarity, and subjective perceptual details when compared to SOTA methods. In addition, the proposed method achieves the highest mIoU of 82.12 and 71.41 on the SUIM and DUT datasets, respectively. Code will be available at https://github.com/SAWRJJ/UWSegFormer.

Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention

TL;DR

This work tackles degraded underwater image quality that hampers semantic segmentation by introducing UWSegFormer, a transformer-based framework that incorporates Underwater Image Quality Attention (UIQA) to emphasize high-quality semantic channels, Multi-scale Aggregation Attention (MAA) to fuse multi-scale features by leveraging high-level context for low-level details, and Edge Learning Loss (ELL) to enforce sharper boundary learning. Built on a SegFormer-like architecture, UIQA, MAA, and ELL collectively improve segmentation completeness and boundary clarity, achieving state-of-the-art results on SUIM () and DUT () with reduced computational cost. The approach demonstrates strong generalization to different backbones and offers practical impact for underwater navigation and seabed exploration where lighting and scattering degrade image quality. Overall, the paper advances underwater semantic segmentation by coupling quality-aware channel attention with cross-scale feature aggregation and boundary-focused supervision in a Transformer-based framework.

Abstract

Underwater image understanding is crucial for both submarine navigation and seabed exploration. However, the low illumination in underwater environments degrades the imaging quality, which in turn seriously deteriorates the performance of underwater semantic segmentation, particularly for outlining the object region boundaries. To tackle this issue, we present UnderWater SegFormer (UWSegFormer), a transformer-based framework for semantic segmentation of low-quality underwater images. Firstly, we propose the Underwater Image Quality Attention (UIQA) module. This module enhances the representation of highquality semantic information in underwater image feature channels through a channel self-attention mechanism. In order to address the issue of loss of imaging details due to the underwater environment, the Multi-scale Aggregation Attention(MAA) module is proposed. This module aggregates sets of semantic features at different scales by extracting discriminative information from high-level features,thus compensating for the semantic loss of detail in underwater objects. Finally, during training, we introduce Edge Learning Loss (ELL) in order to enhance the model's learning of underwater object edges and improve the model's prediction accuracy. Experiments conducted on the SUIM and DUT-USEG (DUT) datasets have demonstrated that the proposed method has advantages in terms of segmentation completeness, boundary clarity, and subjective perceptual details when compared to SOTA methods. In addition, the proposed method achieves the highest mIoU of 82.12 and 71.41 on the SUIM and DUT datasets, respectively. Code will be available at https://github.com/SAWRJJ/UWSegFormer.

Paper Structure

This paper contains 17 sections, 16 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Framework of the proposed UWSegFormer. The encoder part with hierarchical Transformer and UIQA to extract coarse and fine features. The decoder part with MAA to exploit multi-level features and predict semantic segmentation masks by aggregation. During the training, the ELL is incorporated into the loss function.
  • Figure 2: Detailed structure of the UIQA module.
  • Figure 3: Detailed structure of the MAA module, where the role of the ConvBN layer is to change the channel $C_i$ of the input features into $C$.
  • Figure 4: Details of SUIM and DUT datasets.
  • Figure 5: The effect of $N_M$ on the performance of the model. The blue line shows mIoU and the red line shows GFlops
  • ...and 1 more figures