{S$^3$-Mamba}: Small-Size-Sensitive Mamba for Lesion Segmentation
Gui Wang, Yuexiang Li, Wenting Chen, Meidan Ding, Wooi Ping Cheah, Rong Qu, Jianfeng Ren, Linlin Shen
TL;DR
This work targets the persistent challenge of small lesion segmentation in medical images, where down-sampling often loses crucial local cues. It introduces S$^3$-Mamba, a small-size-sensitive extension of the Mamba framework, comprising three components: Enhanced Visual State Space Block (EnVSSBlock) to preserve fine details via channel-wise attention and residual connections; Tensor-based Cross-feature Multi-scale Attention (TCMA) to fuse multi-scale, multi-modal features including edge information; and a regularized curriculum learning strategy that progressively focuses training on harder, small-lesion samples through a Difficulty Measurer and a constrained training schedule. The authors formulate training with a regularization term g(v) that balances emphasis on easy versus hard samples, with $\min_\phi \sum_i v_i l_i + g(\mathbf{v})$ and $g(\mathbf{v}) = \lambda \sum_i \frac{1}{\mathcal{F}_\text{rank}(l_i)} + (1-\lambda) \sum_i v_i^2$, encouraging gradual shifting to challenging cases. Empirical results across ISIC2018, CVC-ClinicDB, and a private Lymph dataset show that S$^3$-Mamba achieves state-of-the-art or near-state-of-the-art performance, particularly for small lesions, with a favorable balance of accuracy and computational efficiency (e.g., 27.58 GFLOPs and 4.64M parameters). Ablation studies confirm that EnCFBlock, TCMA, and the curriculum strategy contribute complementary gains, validating the design choices. Overall, the approach offers a practical and scalable solution for early disease assessment by improving small-lesion segmentation while remaining computationally efficient.
Abstract
Small lesions play a critical role in early disease diagnosis and intervention of severe infections. Popular models often face challenges in segmenting small lesions, as it occupies only a minor portion of an image, while down\_sampling operations may inevitably lose focus on local features of small lesions. To tackle the challenges, we propose a {\bf S}mall-{\bf S}ize-{\bf S}ensitive {\bf Mamba} ({\bf S$^3$-Mamba}), which promotes the sensitivity to small lesions across three dimensions: channel, spatial, and training strategy. Specifically, an Enhanced Visual State Space block is designed to focus on small lesions through multiple residual connections to preserve local features, and selectively amplify important details while suppressing irrelevant ones through channel-wise attention. A Tensor-based Cross-feature Multi-scale Attention is designed to integrate input image features and intermediate-layer features with edge features and exploit the attentive support of features across multiple scales, thereby retaining spatial details of small lesions at various granularities. Finally, we introduce a novel regularized curriculum learning to automatically assess lesion size and sample difficulty, and gradually focus from easy samples to hard ones like small lesions. Extensive experiments on three medical image segmentation datasets show the superiority of our S$^3$-Mamba, especially in segmenting small lesions. Our code is available at https://github.com/ErinWang2023/S3-Mamba.
