Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

Chao Ma; Ziyang Wang

Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

Chao Ma, Ziyang Wang

TL;DR

This work tackles the challenge of scarce annotations in medical image segmentation by introducing Semi-Mamba-UNet, which fuses a visual Mamba-based U-Net with a CNN-based UNet within a semi-supervised framework. The method employs pixel-level cross-supervision between the two backbones and a pixel-level contrastive learning component, with losses decomposed into $\mathcal{L}_{\rm sup}$, $\mathcal{L}_{\rm semi}$, and $\mathcal{L}_{\rm contra}$ to leverage both labeled and unlabeled data. A cross-architecture SSL strategy enables mutual pseudo-labeling and cross-training, while pixel-level projections enhance feature learning on unlabeled samples. Evaluations on MRI cardiac (ACDC) and MR prostate (PROMISE12) datasets demonstrate superior performance over seven SSL baselines, with open-source code for reproducibility.

Abstract

Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, the convolutional neural network (CNN) excels in capturing local image features, whereas the Vision Transformer (ViT) adeptly models long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both the CNN and ViT face challenges in efficiently processing long-range dependencies in medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this study introduces Semi-Mamba-UNet, which integrates a purely visual Mamba-based U-shaped encoder-decoder architecture with a conventional CNN-based UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages both networks to generate pseudo-labels and cross-supervise one another at the pixel level simultaneously, drawing inspiration from consistency regularisation techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy that employs a pair of projectors to enhance the feature learning capabilities further, especially on unlabelled data. Semi-Mamba-UNet was comprehensively evaluated on two publicly available segmentation dataset and compared with seven other SSL frameworks with both CNN- or ViT-based UNet as the backbone network, highlighting the superior performance of the proposed method. The source code of Semi-Mamba-Unet, all baseline SSL frameworks, the CNN- and ViT-based networks, and the two corresponding datasets are made publicly accessible.

Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

TL;DR

, and

to leverage both labeled and unlabeled data. A cross-architecture SSL strategy enables mutual pseudo-labeling and cross-training, while pixel-level projections enhance feature learning on unlabeled samples. Evaluations on MRI cardiac (ACDC) and MR prostate (PROMISE12) datasets demonstrate superior performance over seven SSL baselines, with open-source code for reproducibility.

Abstract

Paper Structure (9 sections, 9 equations, 10 figures, 7 tables)

This paper contains 9 sections, 9 equations, 10 figures, 7 tables.

Introduction
Related Work
Mamba in Medical Image Segmentation
Medical Image Segmentation with Limited Annotations
Methodology
Mamba-UNet
Pixel-Level Cross-Supervised Learning
Experiments and Results
Conclusion

Figures (10)

Figure 1: Development history of semi-supervised learning, supervised learning for medical image segmentation, and network architecture. Source: CNN long2015fully, Transformerliu2021swin, Mambaliu2024vmamba, UNetronneberger2015u, Swin-UNetcao2022swin, Mamba-UNetwang2024mamba, CPSchen2021semi, cross-teaching CNN & ViTluo2021semi, and proposed Semi-Mamba-UNet.
Figure 2: Semi-Mamba-UNet: Framework for pixel-level contrastive cross-supervised Visual Mamba-based UNet for semi-supervised medical image segmentation.
Figure 3: Segmentation backbone network in this study. (a) Encoder-decoder style segmentation network. (b) Two-layer CNN-based network block of UNet. (c) Two-layer Swin ViT-based network of Swin-UNet. (d) Two-layer Visual Mamba-based network block of Mamba-UNet.
Figure 4: Three randomly selected example MRI images in MRI cardiac test set, ground truth, and corresponding segmentation results of all baseline methods and Semi-Mamba-UNet when three cases of data were assumed as labelled data.
Figure 5: Three randomly selected example MRI images in MRI cardiac test set, ground truth, and corresponding segmentation results of all baseline methods and Semi-Mamba-UNet when five cases of data were assumed as labelled data.
...and 5 more figures

Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

TL;DR

Abstract

Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)