MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Hao Shao, Quansheng Zeng, Qibin Hou, Jufeng Yang
TL;DR
MCANet tackles medical image segmentation challenges posed by varying lesion and organ sizes by introducing Multi-scale Cross-axis Attention (MCA) built on an MSCAN encoder. MCA blends multi-scale strip-shaped convolutions with dual cross-axis attention to fuse horizontal and vertical context efficiently, enabling long-range interactions without heavy computation. The decoder aggregates multi-stage encoder features to produce high-resolution segmentation maps, resulting in a compact model with roughly 0.14–0.55M parameters that achieves state-of-the-art or competitive results across skin lesions, nuclei, abdominal organs, and polyps. Ablation studies confirm that combining multi-scale convolutions and cross-axis attention yields the largest performance gains and efficiency improvements.
Abstract
Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https://github.com/haoshao-nku/medical_seg.
