SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation
Yeqing Yang, Le Xu, Lixia Tian
TL;DR
This work addresses the challenge of adapting video-oriented foundation models to 3D medical image segmentation by modeling spatial slice continuity and boundary accuracy. It proposes SAM2-3dMed, which adds Slice Relative Position Prediction (SRPP) to learn bidirectional inter-slice relationships and a Boundary Detection (BD) branch to enhance boundary delineation, while keeping the SAM2 encoder frozen. Evaluations on MSD Lung, Spleen, and Pancreas demonstrate consistent gains in Dice, IoU, NSD and reductions in HD95 compared with state-of-the-art baselines, validating the effectiveness of the approach. The results establish a general paradigm for transferring video foundation models to volumetric medical data, enabling data-efficient, high-precision segmentation with potential clinical impact.
Abstract
Accurate segmentation of 3D medical images is critical for clinical applications like disease assessment and treatment planning. While the Segment Anything Model 2 (SAM2) has shown remarkable success in video object segmentation by leveraging temporal cues, its direct application to 3D medical images faces two fundamental domain gaps: 1) the bidirectional anatomical continuity between slices contrasts sharply with the unidirectional temporal flow in videos, and 2) precise boundary delineation, crucial for morphological analysis, is often underexplored in video tasks. To bridge these gaps, we propose SAM2-3dMed, an adaptation of SAM2 for 3D medical imaging. Our framework introduces two key innovations: 1) a Slice Relative Position Prediction (SRPP) module explicitly models bidirectional inter-slice dependencies by guiding SAM2 to predict the relative positions of different slices in a self-supervised manner; 2) a Boundary Detection (BD) module enhances segmentation accuracy along critical organ and tissue boundaries. Extensive experiments on three diverse medical datasets (the Lung, Spleen, and Pancreas in the Medical Segmentation Decathlon (MSD) dataset) demonstrate that SAM2-3dMed significantly outperforms state-of-the-art methods, achieving superior performance in segmentation overlap and boundary precision. Our approach not only advances 3D medical image segmentation performance but also offers a general paradigm for adapting video-centric foundation models to spatial volumetric data.
