Table of Contents
Fetching ...

SpectralMamba-UNet: Frequency-Disentangled State Space Modeling for Texture-Structure Consistent Medical Image Segmentation

Fuhao Zhang, Lei Liu, Jialin Zhang, Ya-Nan Zhang, Nan Mu

TL;DR

This work proposes SpectralMamba-UNet, a novel frequency-disentangled framework to decouple the learning of structural and textural information in the spectral domain, and introduces a Spectral Channel Reweighting mechanism to form channel-wise frequency-aware attention and a Spectral-Guided Fusion module to achieve adaptively multi-scale fusion in the decoder.

Abstract

Accurate medical image segmentation requires effective modeling of both global anatomical structures and fine-grained boundary details. Recent state space models (e.g., Vision Mamba) offer efficient long-range dependency modeling. However, their one-dimensional serialization weakens local spatial continuity and high-frequency representation. To this end, we propose SpectralMamba-UNet, a novel frequency-disentangled framework to decouple the learning of structural and textural information in the spectral domain. Our Spectral Decomposition and Modeling (SDM) module applies discrete cosine transform to decompose low- and high-frequency features, where low frequency contributes to global contextual modeling via a frequency-domain Mamba and high frequency preserves boundary-sensitive details. To balance spectral contributions, we introduce a Spectral Channel Reweighting (SCR) mechanism to form channel-wise frequency-aware attention, and a Spectral-Guided Fusion (SGF) module to achieve adaptively multi-scale fusion in the decoder. Experiments on five public benchmarks demonstrate consistent improvements across diverse modalities and segmentation targets, validating the effectiveness and generalizability of our approach.

SpectralMamba-UNet: Frequency-Disentangled State Space Modeling for Texture-Structure Consistent Medical Image Segmentation

TL;DR

This work proposes SpectralMamba-UNet, a novel frequency-disentangled framework to decouple the learning of structural and textural information in the spectral domain, and introduces a Spectral Channel Reweighting mechanism to form channel-wise frequency-aware attention and a Spectral-Guided Fusion module to achieve adaptively multi-scale fusion in the decoder.

Abstract

Accurate medical image segmentation requires effective modeling of both global anatomical structures and fine-grained boundary details. Recent state space models (e.g., Vision Mamba) offer efficient long-range dependency modeling. However, their one-dimensional serialization weakens local spatial continuity and high-frequency representation. To this end, we propose SpectralMamba-UNet, a novel frequency-disentangled framework to decouple the learning of structural and textural information in the spectral domain. Our Spectral Decomposition and Modeling (SDM) module applies discrete cosine transform to decompose low- and high-frequency features, where low frequency contributes to global contextual modeling via a frequency-domain Mamba and high frequency preserves boundary-sensitive details. To balance spectral contributions, we introduce a Spectral Channel Reweighting (SCR) mechanism to form channel-wise frequency-aware attention, and a Spectral-Guided Fusion (SGF) module to achieve adaptively multi-scale fusion in the decoder. Experiments on five public benchmarks demonstrate consistent improvements across diverse modalities and segmentation targets, validating the effectiveness and generalizability of our approach.
Paper Structure (12 sections, 5 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 5 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architecture of SpectralMamba-UNet. SDM performs spectral decomposition, SCR reweights frequency responses, and SGF enables frequency-guided decoder fusion.
  • Figure 2: Qualitative comparison on Synapse, ACDC, EAT, IA, and DRIVE (left to right). Compared with representative baselines (c–g), SpectralMamba-UNet (h) produces sharper boundaries and improved topological consistency.
  • Figure 3: Qualitative comparison of ablation variants. From left to right: input image, ground truth, Baseline, +Freq, +Spatial Mamba, +Freq+SCR+SGF, +SDM, and the complete SpectralMamba-UNet. The full model produces clearer boundaries and improved structural continuity across datasets.