Table of Contents
Fetching ...

Mamba-Sea: A Mamba-based Framework with Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation

Zihan Cheng, Jintao Guo, Jian Zhang, Lei Qi, Luping Zhou, Yinghuan Shi, Yang Gao

TL;DR

This work tackles the challenge of domain generalization in medical image segmentation where distribution shifts across sites degrade performance. It introduces Mamba-Sea, a Mamba-based framework that couples global appearance augmentation (GVA) with local sequence-wise style transformation (LSA) to diversify both global and sequence-level token dependencies, supplemented by semantic consistency training to enforce domain-invariant predictions. Empirical results on Fundus and Prostate DG benchmarks, plus a large-scale skin lesion dataset, show state-of-the-art Dice scores and favorable computational efficiency, with notable improvements over CNN- and ViT-based DG methods as well as SAM-based baselines. The approach demonstrates robust generalization under domain shifts and offers a scalable, modular pathway to leverage Mamba for DG in medical image segmentation, with clear avenues for future enhancements such as feature disentanglement and foundation-model integration.

Abstract

To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation. The success of Mamba is primarily owing to its ability to capture long-range dependencies while keeping linear complexity with input sequence length, making it a promising alternative to CNNs and ViTs. Inspired by the success, in the paper, we explore the potential of the Mamba architecture to address distribution shifts in DG for medical image segmentation. Specifically, we propose a novel Mamba-based framework, Mamba-Sea, incorporating global-to-local sequence augmentation to improve the model's generalizability under domain shift issues. Our Mamba-Sea introduces a global augmentation mechanism designed to simulate potential variations in appearance across different sites, aiming to suppress the model's learning of domain-specific information. At the local level, we propose a sequence-wise augmentation along input sequences, which perturbs the style of tokens within random continuous sub-sequences by modeling and resampling style statistics associated with domain shifts. To our best knowledge, Mamba-Sea is the first work to explore the generalization of Mamba for medical image segmentation, providing an advanced and promising Mamba-based architecture with strong robustness to domain shifts. Remarkably, our proposed method is the first to surpass a Dice coefficient of 90% on the Prostate dataset, which exceeds previous SOTA of 88.61%. The code is available at https://github.com/orange-czh/Mamba-Sea.

Mamba-Sea: A Mamba-based Framework with Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation

TL;DR

This work tackles the challenge of domain generalization in medical image segmentation where distribution shifts across sites degrade performance. It introduces Mamba-Sea, a Mamba-based framework that couples global appearance augmentation (GVA) with local sequence-wise style transformation (LSA) to diversify both global and sequence-level token dependencies, supplemented by semantic consistency training to enforce domain-invariant predictions. Empirical results on Fundus and Prostate DG benchmarks, plus a large-scale skin lesion dataset, show state-of-the-art Dice scores and favorable computational efficiency, with notable improvements over CNN- and ViT-based DG methods as well as SAM-based baselines. The approach demonstrates robust generalization under domain shifts and offers a scalable, modular pathway to leverage Mamba for DG in medical image segmentation, with clear avenues for future enhancements such as feature disentanglement and foundation-model integration.

Abstract

To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation. The success of Mamba is primarily owing to its ability to capture long-range dependencies while keeping linear complexity with input sequence length, making it a promising alternative to CNNs and ViTs. Inspired by the success, in the paper, we explore the potential of the Mamba architecture to address distribution shifts in DG for medical image segmentation. Specifically, we propose a novel Mamba-based framework, Mamba-Sea, incorporating global-to-local sequence augmentation to improve the model's generalizability under domain shift issues. Our Mamba-Sea introduces a global augmentation mechanism designed to simulate potential variations in appearance across different sites, aiming to suppress the model's learning of domain-specific information. At the local level, we propose a sequence-wise augmentation along input sequences, which perturbs the style of tokens within random continuous sub-sequences by modeling and resampling style statistics associated with domain shifts. To our best knowledge, Mamba-Sea is the first work to explore the generalization of Mamba for medical image segmentation, providing an advanced and promising Mamba-based architecture with strong robustness to domain shifts. Remarkably, our proposed method is the first to surpass a Dice coefficient of 90% on the Prostate dataset, which exceeds previous SOTA of 88.61%. The code is available at https://github.com/orange-czh/Mamba-Sea.

Paper Structure

This paper contains 26 sections, 17 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Typical images from three public medical datasets with increasing domain gaps; and comparison of the segmentation performance of Mamba-Sea with UNetronneberger2015u, Swin-Unet2, Med-SAwu2023medical, and VM-Unetruan2024vm on these three datasets.
  • Figure 2: The overall structure of Mamba-Sea. (a) represents GVA module. As shown in (b), both the original images and the globally-augmented images are fed into the SSM-based generalizable segmentation network for training. (c) represents the core components of segmentation network, where we design LSA module shown in (d) to achieve augmentation at the local level.
  • Figure 3: Examples of original images alongside their augmented images after applying our proposed global appearance variation augmentation (GVA) on Fundus dataset. The PSNR and SSIM values indicate the pixel-wise and structural similarities between the original and augmented images.
  • Figure 4: Structure of LSA. For each sequence, a unique mask is generated to enhance the diversity of the augmentation.
  • Figure 5: Visualization on segmentation results of different methods on Fundus dataset. The yellow contours indicate the boundaries of ground truths while the semi-transparent overlays are predictions.
  • ...and 3 more figures