Table of Contents
Fetching ...

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

Hao Shao, Quansheng Zeng, Qibin Hou, Jufeng Yang

TL;DR

MCANet tackles medical image segmentation challenges posed by varying lesion and organ sizes by introducing Multi-scale Cross-axis Attention (MCA) built on an MSCAN encoder. MCA blends multi-scale strip-shaped convolutions with dual cross-axis attention to fuse horizontal and vertical context efficiently, enabling long-range interactions without heavy computation. The decoder aggregates multi-stage encoder features to produce high-resolution segmentation maps, resulting in a compact model with roughly 0.14–0.55M parameters that achieves state-of-the-art or competitive results across skin lesions, nuclei, abdominal organs, and polyps. Ablation studies confirm that combining multi-scale convolutions and cross-axis attention yields the largest performance gains and efficiency improvements.

Abstract

Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https://github.com/haoshao-nku/medical_seg.

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

TL;DR

MCANet tackles medical image segmentation challenges posed by varying lesion and organ sizes by introducing Multi-scale Cross-axis Attention (MCA) built on an MSCAN encoder. MCA blends multi-scale strip-shaped convolutions with dual cross-axis attention to fuse horizontal and vertical context efficiently, enabling long-range interactions without heavy computation. The decoder aggregates multi-stage encoder features to produce high-resolution segmentation maps, resulting in a compact model with roughly 0.14–0.55M parameters that achieves state-of-the-art or competitive results across skin lesions, nuclei, abdominal organs, and polyps. Ablation studies confirm that combining multi-scale convolutions and cross-axis attention yields the largest performance gains and efficiency improvements.

Abstract

Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https://github.com/haoshao-nku/medical_seg.
Paper Structure (19 sections, 7 equations, 9 figures, 8 tables)

This paper contains 19 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Performance and Flops of our method compared to other mainstream approaches on four benchmarks. Our tiny-sized model achieves state-of-the-art performance on all four tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation, while it is more computationally efficient.
  • Figure 2: Detailed visualization of our method compared with the recently popular medical segmentation methods (e.g., MISSFormer huang2022missformer and Swin-UNet cao2021swin ) on the synapse dataset. The segmentation details produced by different methods are shown in focus in the blue rectangular box areas. Our method performs better than other methods.
  • Figure 3: Overall architecture of the proposed MCANet. We take the MSCAN network proposed in SegNeXt guo2022segnext as our encoder because of its capability of capturing multi-scale features. The feature maps from the last three stages of the encoder are combined via upsampling and then concatenated as the input of the decoder. Our decoder is based on multi-scale cross-axis attention, which takes advantage of both multi-scale convolutional features and the axial attention.
  • Figure 4: Detailed structure of the proposed multi-scale cross-axis attention decoder. Our decoder contains two parallel paths, each of which contains multi-scale 1D convolutions and cross-axis attention to aggregate the spatial information. Note that we do not add any activation functions in decoder.
  • Figure 5: Segmentation results of different methods on the DSB2018 dataset and ISIC-2018 dataset.
  • ...and 4 more figures