Table of Contents
Fetching ...

SSFMamba: Learning Symmetry-driven Spatial-Frequency Modeling for Physically Consistent 3D Medical Image Segmentation

Bo Zhang, Yifan Zhang, Shuo Yan, Yu Bai, Zheng Zhang, Wu Liu, Wendong Wang, Yongdong Zhang

TL;DR

SSFMamba is proposed, a Mamba based Symmetry-driven Spatial-Frequency fusion framework tailored for 3D medical imaging that achieves exceptional performance on low-contrast organs such as the pancreas, underscoring its potential as a unified and physically consistent perception framework for diverse 3D clinical applications.

Abstract

Accurate 3D medical image segmentation requires a delicate balance between fine-grained local details and global contextual understanding. While spatial-domain models often struggle with long-range dependencies, existing frequency-based approaches frequently overlook intrinsic spectral properties such as Hermitian symmetry, leading to suboptimal feature integration. In this paper, we propose SSFMamba, a Mamba based Symmetry-driven Spatial-Frequency fusion framework tailored for 3D medical imaging. Our architecture employs a complementary dual-branch design: the spatial branch preserves intricate anatomical textures, while the frequency branch captures global contextual dependencies in the frequency domain. A core innovation is the 3D Multi-Directional Scanning Mechanism (MDSM), which integrates Hermitian symmetry with the causal nature of State Space Models (SSMs) to enable direction-aware global modeling. Crucially, by shifting the modeling focus to frequency-domain spectral components, SSFMamba captures the underlying structural characteristics of anatomical tissues. This leads to a highly adaptable framework that excels in both MRI and CT applications, regardless of the significant variations in intensity distributions. Extensive evaluations on the BraTS2020, BraTS2023, and BTCV datasets demonstrate that SSFMamba consistently outperforms state-of-the-art methods. Notably, our approach achieves exceptional performance on low-contrast organs such as the pancreas (81.97% Dice), underscoring its potential as a unified and physically consistent perception framework for diverse 3D clinical applications.

SSFMamba: Learning Symmetry-driven Spatial-Frequency Modeling for Physically Consistent 3D Medical Image Segmentation

TL;DR

SSFMamba is proposed, a Mamba based Symmetry-driven Spatial-Frequency fusion framework tailored for 3D medical imaging that achieves exceptional performance on low-contrast organs such as the pancreas, underscoring its potential as a unified and physically consistent perception framework for diverse 3D clinical applications.

Abstract

Accurate 3D medical image segmentation requires a delicate balance between fine-grained local details and global contextual understanding. While spatial-domain models often struggle with long-range dependencies, existing frequency-based approaches frequently overlook intrinsic spectral properties such as Hermitian symmetry, leading to suboptimal feature integration. In this paper, we propose SSFMamba, a Mamba based Symmetry-driven Spatial-Frequency fusion framework tailored for 3D medical imaging. Our architecture employs a complementary dual-branch design: the spatial branch preserves intricate anatomical textures, while the frequency branch captures global contextual dependencies in the frequency domain. A core innovation is the 3D Multi-Directional Scanning Mechanism (MDSM), which integrates Hermitian symmetry with the causal nature of State Space Models (SSMs) to enable direction-aware global modeling. Crucially, by shifting the modeling focus to frequency-domain spectral components, SSFMamba captures the underlying structural characteristics of anatomical tissues. This leads to a highly adaptable framework that excels in both MRI and CT applications, regardless of the significant variations in intensity distributions. Extensive evaluations on the BraTS2020, BraTS2023, and BTCV datasets demonstrate that SSFMamba consistently outperforms state-of-the-art methods. Notably, our approach achieves exceptional performance on low-contrast organs such as the pancreas (81.97% Dice), underscoring its potential as a unified and physically consistent perception framework for diverse 3D clinical applications.

Paper Structure

This paper contains 20 sections, 7 equations, 7 figures, 5 tables, 4 algorithms.

Figures (7)

  • Figure 1: Feature maps of: (a) origin image, (b) spatial domain, (c) frequency domain, (d) fused multi-domain; and (e) origin image; (f) annotation; (g) our feature map; (h) SegMamba's feature map. Spatial domain can provide comprehensive representation but exhibit blurred boundaries; frequency domain can provide clear boundaries and high-contrast details, and the fused features can effectively emphasize the tumor region and its peripheral contours by integrating both representations.
  • Figure 2: The overall architecture of the proposed SSFMamba. The encoder utilizes convolutional layers and multiple MDIF Encoders to extract multi-scale features. Each MDIF Encoder consists of two MDIF Blocks and a downsampling layer, designed to capture and integrate multi-domain information. Notably, the MDIF Blocks are based on Mamba Blocks. In the decoder, transposed convolutions upsample the feature maps, which are then concatenated with corresponding skip connection feature maps from the encoder. Subsequent convolutional layers perform feature fusion to restore high-resolution representations. Additionally, residual connections are incorporated to facilitate effective deep network training.
  • Figure 3: The detailed architecture of the proposed MDIF Block. This block consists of two branches: one extracts spatial domain feature while the other extracts frequency domain feature, targeting the capture of global context and local details, respectively. The features from both branches are then fused using an MLP module and combined with the input feature maps to generate the output feature maps.
  • Figure 4: The detailed architecture of the proposed Frequency Mamba, a 3D multi-directional scanning mechanism. In the 3D multi-directional scanning mechanism, features are extracted along different orientations using distinct Mamba modules, and different colors represent different slices. These sequences are then transposed back to their original 3D arrangement. After applying the IFFT, we perform element-wise addition at the same spatial positions.
  • Figure 5: Qualitative comparison on the BraTS2023 Dataset. The annotated regions are categorized into three classes: red indicates NCR (necrotic tumor core), blue indicates ET (GD-enhancing tumor), and green indicates ED (the peritumoral edematous/invaded tissue). Below each image, a magnified view of the corresponding region is provided to highlight the differences in fine-grained segmentation performance across methods.
  • ...and 2 more figures