ShapeMamba-EM: Fine-Tuning Foundation Model with Local Shape Descriptors and Mamba Blocks for 3D EM Image Segmentation
Ruohua Shi, Qiufan Pang, Lei Ma, Lingyu Duan, Tiejun Huang, Tingting Jiang
TL;DR
ShapeMamba-EM tackles the domain gap in EM segmentation by fine-tuning a 3D medical foundation model (SAM-Med3D) with FacT for efficient encoder adaptation, and by adding a 3D Mamba Adapter for long-range dependencies and a 3D Local Shape Descriptor Encoder to capture local morphology. The LSD is a 10-dimensional per-voxel embedding defined by $lsd^{y}(v)=( s(S_v), m(S_v) - v, c(S_v) )$ with $S_v=\{ v' ∈ Ω \mid y(v)=y(v'), ∥v - v'∥_2^2 ≤ σ \}$, where $s(S_v)=|S_v|$, $m(S_v)=\frac{1}{|S_v|}\sum_{v'∈S_v} v'$, and $c(S_v)=\frac{1}{|S_v|}\sum_{v'∈S_v} (v' - m(S_v))(v' - m(S_v))^T$. A 3D U-Net learns $lsd^{x}: Ω → R^{10}$ from raw data and provides it as input to the Emage Decoder, while the prompt encoder is discarded and the decoder is fully fine-tuned. Evaluated on 10 EM datasets across five tasks, ShapeMamba-EM outperforms state-of-the-art baselines and several SAM-finetuning strategies, demonstrating improved segmentation accuracy and efficiency for high-resolution neural tissue analysis. The results establish ShapeMamba-EM as a versatile and scalable approach for leveraging medical foundation models in 3D EM segmentation, with potential to advance connectomics and neuroscience research.
Abstract
Electron microscopy (EM) imaging offers unparalleled resolution for analyzing neural tissues, crucial for uncovering the intricacies of synaptic connections and neural processes fundamental to understanding behavioral mechanisms. Recently, the foundation models have demonstrated impressive performance across numerous natural and medical image segmentation tasks. However, applying these foundation models to EM segmentation faces significant challenges due to domain disparities. This paper presents ShapeMamba-EM, a specialized fine-tuning method for 3D EM segmentation, which employs adapters for long-range dependency modeling and an encoder for local shape description within the original foundation model. This approach effectively addresses the unique volumetric and morphological complexities of EM data. Tested over a wide range of EM images, covering five segmentation tasks and 10 datasets, ShapeMamba-EM outperforms existing methods, establishing a new standard in EM image segmentation and enhancing the understanding of neural tissue architecture.
