Table of Contents
Fetching ...

MSA$^2$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation

Sina Ghorbani Kolahi, Seyed Kamal Chaharsooghi, Toktam Khatibi, Afshin Bozorgpour, Reza Azad, Moein Heidari, Ilker Hacihaliloglu, Dorit Merhof

TL;DR

This paper introduces MSA$^2$Net, a new deep segmentation framework featuring an expedient design of skip-connections that outperforms state-of-the-art (SOTA) works or matches their performance and proposes a Multi-Scale Adaptive Spatial Attention Gate (MASAG).

Abstract

Medical image segmentation involves identifying and separating object instances in a medical image to delineate various tissues and structures, a task complicated by the significant variations in size, shape, and density of these features. Convolutional neural networks (CNNs) have traditionally been used for this task but have limitations in capturing long-range dependencies. Transformers, equipped with self-attention mechanisms, aim to address this problem. However, in medical image segmentation it is beneficial to merge both local and global features to effectively integrate feature maps across various scales, capturing both detailed features and broader semantic elements for dealing with variations in structures. In this paper, we introduce MSA$^2$Net, a new deep segmentation framework featuring an expedient design of skip-connections. These connections facilitate feature fusion by dynamically weighting and combining coarse-grained encoder features with fine-grained decoder feature maps. Specifically, we propose a Multi-Scale Adaptive Spatial Attention Gate (MASAG), which dynamically adjusts the receptive field (Local and Global contextual information) to ensure that spatially relevant features are selectively highlighted while minimizing background distractions. Extensive evaluations involving dermatology, and radiological datasets demonstrate that our MSA$^2$Net outperforms state-of-the-art (SOTA) works or matches their performance. The source code is publicly available at https://github.com/xmindflow/MSA-2Net.

MSA$^2$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation

TL;DR

This paper introduces MSANet, a new deep segmentation framework featuring an expedient design of skip-connections that outperforms state-of-the-art (SOTA) works or matches their performance and proposes a Multi-Scale Adaptive Spatial Attention Gate (MASAG).

Abstract

Medical image segmentation involves identifying and separating object instances in a medical image to delineate various tissues and structures, a task complicated by the significant variations in size, shape, and density of these features. Convolutional neural networks (CNNs) have traditionally been used for this task but have limitations in capturing long-range dependencies. Transformers, equipped with self-attention mechanisms, aim to address this problem. However, in medical image segmentation it is beneficial to merge both local and global features to effectively integrate feature maps across various scales, capturing both detailed features and broader semantic elements for dealing with variations in structures. In this paper, we introduce MSANet, a new deep segmentation framework featuring an expedient design of skip-connections. These connections facilitate feature fusion by dynamically weighting and combining coarse-grained encoder features with fine-grained decoder feature maps. Specifically, we propose a Multi-Scale Adaptive Spatial Attention Gate (MASAG), which dynamically adjusts the receptive field (Local and Global contextual information) to ensure that spatially relevant features are selectively highlighted while minimizing background distractions. Extensive evaluations involving dermatology, and radiological datasets demonstrate that our MSANet outperforms state-of-the-art (SOTA) works or matches their performance. The source code is publicly available at https://github.com/xmindflow/MSA-2Net.
Paper Structure (20 sections, 4 equations, 5 figures, 4 tables)

This paper contains 20 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Our proposed segmentation network, called MSA$^\text{2}$Net, is composed of an encoder (using pretrained MaxViT block) and a decoder (comprising DAE-Former blocks in deeper layers and LKA blocks in shallow ones). The encoding-decoding feature fusion is performed via our novel MASAG module.
  • Figure 2: A comparative visual examination of the proposed approach in contrast to different methods employing the Synapse multi-organ segmentation dataset.
  • Figure 3: Visual comparisons of various techniques on the ISIC2018 skin lesion segmentation dataset are depicted. Authentic boundaries are represented in green, while anticipated boundaries are depicted in blue.
  • Figure A: Frequency response analysis on the MSA$^\text{2}$Net (with MASAG module present) vs. MSA$^\text{2}$Net (excluding MASAG module).
  • Figure B: Feature visualization comprising two samples from the last layer of our model with different organs of the Synapse dataset (first 4 rows with smaller organs and the other larger ones). The results exhibit that with the help of the MASAG module, MSA$^\text{2}$Net is robust to the shape, size, and density variations of organs learned at the decoder.