Table of Contents
Fetching ...

{S$^3$-Mamba}: Small-Size-Sensitive Mamba for Lesion Segmentation

Gui Wang, Yuexiang Li, Wenting Chen, Meidan Ding, Wooi Ping Cheah, Rong Qu, Jianfeng Ren, Linlin Shen

TL;DR

This work targets the persistent challenge of small lesion segmentation in medical images, where down-sampling often loses crucial local cues. It introduces S$^3$-Mamba, a small-size-sensitive extension of the Mamba framework, comprising three components: Enhanced Visual State Space Block (EnVSSBlock) to preserve fine details via channel-wise attention and residual connections; Tensor-based Cross-feature Multi-scale Attention (TCMA) to fuse multi-scale, multi-modal features including edge information; and a regularized curriculum learning strategy that progressively focuses training on harder, small-lesion samples through a Difficulty Measurer and a constrained training schedule. The authors formulate training with a regularization term g(v) that balances emphasis on easy versus hard samples, with $\min_\phi \sum_i v_i l_i + g(\mathbf{v})$ and $g(\mathbf{v}) = \lambda \sum_i \frac{1}{\mathcal{F}_\text{rank}(l_i)} + (1-\lambda) \sum_i v_i^2$, encouraging gradual shifting to challenging cases. Empirical results across ISIC2018, CVC-ClinicDB, and a private Lymph dataset show that S$^3$-Mamba achieves state-of-the-art or near-state-of-the-art performance, particularly for small lesions, with a favorable balance of accuracy and computational efficiency (e.g., 27.58 GFLOPs and 4.64M parameters). Ablation studies confirm that EnCFBlock, TCMA, and the curriculum strategy contribute complementary gains, validating the design choices. Overall, the approach offers a practical and scalable solution for early disease assessment by improving small-lesion segmentation while remaining computationally efficient.

Abstract

Small lesions play a critical role in early disease diagnosis and intervention of severe infections. Popular models often face challenges in segmenting small lesions, as it occupies only a minor portion of an image, while down\_sampling operations may inevitably lose focus on local features of small lesions. To tackle the challenges, we propose a {\bf S}mall-{\bf S}ize-{\bf S}ensitive {\bf Mamba} ({\bf S$^3$-Mamba}), which promotes the sensitivity to small lesions across three dimensions: channel, spatial, and training strategy. Specifically, an Enhanced Visual State Space block is designed to focus on small lesions through multiple residual connections to preserve local features, and selectively amplify important details while suppressing irrelevant ones through channel-wise attention. A Tensor-based Cross-feature Multi-scale Attention is designed to integrate input image features and intermediate-layer features with edge features and exploit the attentive support of features across multiple scales, thereby retaining spatial details of small lesions at various granularities. Finally, we introduce a novel regularized curriculum learning to automatically assess lesion size and sample difficulty, and gradually focus from easy samples to hard ones like small lesions. Extensive experiments on three medical image segmentation datasets show the superiority of our S$^3$-Mamba, especially in segmenting small lesions. Our code is available at https://github.com/ErinWang2023/S3-Mamba.

{S$^3$-Mamba}: Small-Size-Sensitive Mamba for Lesion Segmentation

TL;DR

This work targets the persistent challenge of small lesion segmentation in medical images, where down-sampling often loses crucial local cues. It introduces S-Mamba, a small-size-sensitive extension of the Mamba framework, comprising three components: Enhanced Visual State Space Block (EnVSSBlock) to preserve fine details via channel-wise attention and residual connections; Tensor-based Cross-feature Multi-scale Attention (TCMA) to fuse multi-scale, multi-modal features including edge information; and a regularized curriculum learning strategy that progressively focuses training on harder, small-lesion samples through a Difficulty Measurer and a constrained training schedule. The authors formulate training with a regularization term g(v) that balances emphasis on easy versus hard samples, with and , encouraging gradual shifting to challenging cases. Empirical results across ISIC2018, CVC-ClinicDB, and a private Lymph dataset show that S-Mamba achieves state-of-the-art or near-state-of-the-art performance, particularly for small lesions, with a favorable balance of accuracy and computational efficiency (e.g., 27.58 GFLOPs and 4.64M parameters). Ablation studies confirm that EnCFBlock, TCMA, and the curriculum strategy contribute complementary gains, validating the design choices. Overall, the approach offers a practical and scalable solution for early disease assessment by improving small-lesion segmentation while remaining computationally efficient.

Abstract

Small lesions play a critical role in early disease diagnosis and intervention of severe infections. Popular models often face challenges in segmenting small lesions, as it occupies only a minor portion of an image, while down\_sampling operations may inevitably lose focus on local features of small lesions. To tackle the challenges, we propose a {\bf S}mall-{\bf S}ize-{\bf S}ensitive {\bf Mamba} ({\bf S-Mamba}), which promotes the sensitivity to small lesions across three dimensions: channel, spatial, and training strategy. Specifically, an Enhanced Visual State Space block is designed to focus on small lesions through multiple residual connections to preserve local features, and selectively amplify important details while suppressing irrelevant ones through channel-wise attention. A Tensor-based Cross-feature Multi-scale Attention is designed to integrate input image features and intermediate-layer features with edge features and exploit the attentive support of features across multiple scales, thereby retaining spatial details of small lesions at various granularities. Finally, we introduce a novel regularized curriculum learning to automatically assess lesion size and sample difficulty, and gradually focus from easy samples to hard ones like small lesions. Extensive experiments on three medical image segmentation datasets show the superiority of our S-Mamba, especially in segmenting small lesions. Our code is available at https://github.com/ErinWang2023/S3-Mamba.

Paper Structure

This paper contains 13 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Segmentation results of CNN-based Unet Unet, transformere-based H2Former H2former, Mamba-based VmUnet 2024vmunet, and the proposed S$^3$-Mamba for lesions of varying sizes. All models perform well in segmenting large and medium lesions, but S$^3$-Mamba exhibits superior performance in segmenting small lesions, capturing fine details and contours of small lesions with higher accuracy.
  • Figure 2: (a) Overview of the proposed S$^3$-Mamba with TCMA and EnVSSBlock. (b) Detailed architecture of the TCMA, where input image features, intermediate-layer features, and edge features are divided into patches of three different sizes. A tensor-based attention derives the dynamic weights of these patches of three different scales, exploits their interaction, and utilizes the TCMA features to modulate the features at decoder layers. (c) The detailed structure of the EnVSSBlock, which explicitly evaluates and adaptively adjusts the channel weights to enhance small lesion feature representation and preserves fine details through residual connections. (d) Detailed structure of the Enhanced Channel Feature Block (EnCFBlock), which enhances the feature interaction through channel-wise interaction. (e) The architecture of the regularized curriculum learning strategy.
  • Figure 3: Segmentation results on the Lymph dataset. Red outlines represent the ground truth segmentation masks, and the blue marks indicate the model predictions.
  • Figure 4: Comparison with other models in terms of FLOPs, DSC, and model size represented by the circle size on the Lymph dataset.