Table of Contents
Fetching ...

Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

Zhenhuan Zhou, Along He, Yanlin Wu, Rui Yao, Xueshuo Xie, Tao Li

TL;DR

SF-UNet addresses core challenges in medical image segmentation by fusing multi-scale features and incorporating frequency-domain attention. It introduces MPCA for progressive cross-scale channel fusion and FSA for dual-domain learning that combines spatial and frequency information via 2D DFT/IDFT with a lightweight learnable filter. Across ISIC-2018, BUSI, and NKUT, SF-UNet achieves state-of-the-art DSC and IOU while maintaining a compact parameter count, demonstrating robust texture and boundary delineation. This dual-domain, cross-scale framework offers a practical, deployment-friendly path toward more accurate and reliable clinical segmentation.

Abstract

In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or employ deep supervision to enhance multi-scale learning. However, this may lead to feature redundancy and excessive computational overhead, which is not conducive to network training and clinical deployment. Secondly, the majority of medical image segmentation networks exclusively learn features in the spatial domain, disregarding the abundant global information in the frequency domain. This results in a bias towards low-frequency components, neglecting crucial high-frequency information. To address these problems, we introduce SF-UNet, a spatial-frequency dual-domain attention network. It comprises two main components: the Multi-scale Progressive Channel Attention (MPCA) block, which progressively extract multi-scale features across adjacent encoder layers, and the lightweight Frequency-Spatial Attention (FSA) block, with only 0.05M parameters, enabling concurrent learning of texture and boundary features from both spatial and frequency domains. We validate the effectiveness of the proposed SF-UNet on three public datasets. Experimental results show that compared to previous state-of-the-art (SOTA) medical image segmentation networks, SF-UNet achieves the best performance, and achieves up to 9.4\% and 10.78\% improvement in DSC and IOU. Codes will be released at https://github.com/nkicsl/SF-UNet.

Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

TL;DR

SF-UNet addresses core challenges in medical image segmentation by fusing multi-scale features and incorporating frequency-domain attention. It introduces MPCA for progressive cross-scale channel fusion and FSA for dual-domain learning that combines spatial and frequency information via 2D DFT/IDFT with a lightweight learnable filter. Across ISIC-2018, BUSI, and NKUT, SF-UNet achieves state-of-the-art DSC and IOU while maintaining a compact parameter count, demonstrating robust texture and boundary delineation. This dual-domain, cross-scale framework offers a practical, deployment-friendly path toward more accurate and reliable clinical segmentation.

Abstract

In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or employ deep supervision to enhance multi-scale learning. However, this may lead to feature redundancy and excessive computational overhead, which is not conducive to network training and clinical deployment. Secondly, the majority of medical image segmentation networks exclusively learn features in the spatial domain, disregarding the abundant global information in the frequency domain. This results in a bias towards low-frequency components, neglecting crucial high-frequency information. To address these problems, we introduce SF-UNet, a spatial-frequency dual-domain attention network. It comprises two main components: the Multi-scale Progressive Channel Attention (MPCA) block, which progressively extract multi-scale features across adjacent encoder layers, and the lightweight Frequency-Spatial Attention (FSA) block, with only 0.05M parameters, enabling concurrent learning of texture and boundary features from both spatial and frequency domains. We validate the effectiveness of the proposed SF-UNet on three public datasets. Experimental results show that compared to previous state-of-the-art (SOTA) medical image segmentation networks, SF-UNet achieves the best performance, and achieves up to 9.4\% and 10.78\% improvement in DSC and IOU. Codes will be released at https://github.com/nkicsl/SF-UNet.
Paper Structure (18 sections, 12 equations, 4 figures, 4 tables)

This paper contains 18 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The overall architecture of the proposed network and detailed structures of each block. (a) the overall structure of SF-UNet, (b) the structure of MPCA. (c) the structure of FSA. 2D-DFT stands for 2D Discrete Fourier Transform, and 2D-iDFT stands for 2D Inverse Discrete Fourier Transform.
  • Figure 2: Qualitative results on ISIC-2018 dataset. (a) represents the original image and (b) represents the corresponding ground truth. Black represents the background, i.e., normal skin, while white represents lesions. It can be observed that SF-UNet achieves the best performance.
  • Figure 3: Qualitative results on BUSI dataset. (a) represents the original image and (b) represents the corresponding ground truth. Black represents the background, while white represents tumors. It can be observed that SF-UNet achieves the best performance.
  • Figure 4: Qualitative results on NKUT dataset. (a) represents the original image and (b) represents the corresponding ground truth. Red represents Mandibular Wisdom Teeth (MWT), green represents Second Molars (SM), and yellow represents Alveolar Bone (AB).