Merging Context Clustering with Visual State Space Models for Medical Image Segmentation
Yun Zhu, Dong Zhang, Yi Lin, Yifei Feng, Jinhui Tang
TL;DR
This work tackles the challenge of medical image segmentation by enabling simultaneous modeling of long-range and local spatial context. It introduces CCViM, a Vision Mamba-based architecture that integrates a context clustering (CC) layer into a CCS6 module to adaptively form local windows while preserving global receptive fields. Extensive experiments on Kumar, CPM17, ISIC2017/2018, and Synapse demonstrate superior performance over state-of-the-art methods across nuclei, skin lesion, and multi-organ segmentation, with ablations confirming the CC layer’s effectiveness. The approach offers a computationally efficient means to fuse local and global information, with potential for adaptive scanning strategies and broader medical imaging tasks.
Abstract
Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range feature interactions. Recently, vision mamba (ViM) models have emerged as promising solutions for addressing model complexities by excelling in long-range feature iterations with linear complexity. However, existing ViM approaches overlook the importance of preserving short-range local dependencies by directly flattening spatial tokens and are constrained by fixed scanning patterns that limit the capture of dynamic spatial context information. To address these challenges, we introduce a simple yet effective method named context clustering ViM (CCViM), which incorporates a context clustering module within the existing ViM models to segment image tokens into distinct windows for adaptable local clustering. Our method effectively combines long-range and short-range feature interactions, thereby enhancing spatial contextual representations for medical image segmentation tasks. Extensive experimental evaluations on diverse public datasets, i.e., Kumar, CPM17, ISIC17, ISIC18, and Synapse demonstrate the superior performance of our method compared to current state-of-the-art methods. Our code can be found at https://github.com/zymissy/CCViM.
