Table of Contents
Fetching ...

SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation

Yeqing Yang, Le Xu, Lixia Tian

TL;DR

This work addresses the challenge of adapting video-oriented foundation models to 3D medical image segmentation by modeling spatial slice continuity and boundary accuracy. It proposes SAM2-3dMed, which adds Slice Relative Position Prediction (SRPP) to learn bidirectional inter-slice relationships and a Boundary Detection (BD) branch to enhance boundary delineation, while keeping the SAM2 encoder frozen. Evaluations on MSD Lung, Spleen, and Pancreas demonstrate consistent gains in Dice, IoU, NSD and reductions in HD95 compared with state-of-the-art baselines, validating the effectiveness of the approach. The results establish a general paradigm for transferring video foundation models to volumetric medical data, enabling data-efficient, high-precision segmentation with potential clinical impact.

Abstract

Accurate segmentation of 3D medical images is critical for clinical applications like disease assessment and treatment planning. While the Segment Anything Model 2 (SAM2) has shown remarkable success in video object segmentation by leveraging temporal cues, its direct application to 3D medical images faces two fundamental domain gaps: 1) the bidirectional anatomical continuity between slices contrasts sharply with the unidirectional temporal flow in videos, and 2) precise boundary delineation, crucial for morphological analysis, is often underexplored in video tasks. To bridge these gaps, we propose SAM2-3dMed, an adaptation of SAM2 for 3D medical imaging. Our framework introduces two key innovations: 1) a Slice Relative Position Prediction (SRPP) module explicitly models bidirectional inter-slice dependencies by guiding SAM2 to predict the relative positions of different slices in a self-supervised manner; 2) a Boundary Detection (BD) module enhances segmentation accuracy along critical organ and tissue boundaries. Extensive experiments on three diverse medical datasets (the Lung, Spleen, and Pancreas in the Medical Segmentation Decathlon (MSD) dataset) demonstrate that SAM2-3dMed significantly outperforms state-of-the-art methods, achieving superior performance in segmentation overlap and boundary precision. Our approach not only advances 3D medical image segmentation performance but also offers a general paradigm for adapting video-centric foundation models to spatial volumetric data.

SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation

TL;DR

This work addresses the challenge of adapting video-oriented foundation models to 3D medical image segmentation by modeling spatial slice continuity and boundary accuracy. It proposes SAM2-3dMed, which adds Slice Relative Position Prediction (SRPP) to learn bidirectional inter-slice relationships and a Boundary Detection (BD) branch to enhance boundary delineation, while keeping the SAM2 encoder frozen. Evaluations on MSD Lung, Spleen, and Pancreas demonstrate consistent gains in Dice, IoU, NSD and reductions in HD95 compared with state-of-the-art baselines, validating the effectiveness of the approach. The results establish a general paradigm for transferring video foundation models to volumetric medical data, enabling data-efficient, high-precision segmentation with potential clinical impact.

Abstract

Accurate segmentation of 3D medical images is critical for clinical applications like disease assessment and treatment planning. While the Segment Anything Model 2 (SAM2) has shown remarkable success in video object segmentation by leveraging temporal cues, its direct application to 3D medical images faces two fundamental domain gaps: 1) the bidirectional anatomical continuity between slices contrasts sharply with the unidirectional temporal flow in videos, and 2) precise boundary delineation, crucial for morphological analysis, is often underexplored in video tasks. To bridge these gaps, we propose SAM2-3dMed, an adaptation of SAM2 for 3D medical imaging. Our framework introduces two key innovations: 1) a Slice Relative Position Prediction (SRPP) module explicitly models bidirectional inter-slice dependencies by guiding SAM2 to predict the relative positions of different slices in a self-supervised manner; 2) a Boundary Detection (BD) module enhances segmentation accuracy along critical organ and tissue boundaries. Extensive experiments on three diverse medical datasets (the Lung, Spleen, and Pancreas in the Medical Segmentation Decathlon (MSD) dataset) demonstrate that SAM2-3dMed significantly outperforms state-of-the-art methods, achieving superior performance in segmentation overlap and boundary precision. Our approach not only advances 3D medical image segmentation performance but also offers a general paradigm for adapting video-centric foundation models to spatial volumetric data.

Paper Structure

This paper contains 31 sections, 10 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: The comparison focuses on inter-frame dependencies in videos (a) vs. inter-slice dependencies in 3D medical images (b), and the importance of boundary segmentation for videos (c) vs. that for medical images (d).
  • Figure 2: Overview of the proposed SAM2-3dMed Model.
  • Figure 3: Typical segmentation maps for the three tasks. The cyan boxes highlight lower inter-slice continuity, and the orange arrows highlight worse boundary segmentations.
  • Figure 4: Visual comparison of segmentation results with and without Pre-training.
  • Figure 5: Visual comparison of segmentation results with and without SRPP module.
  • ...and 3 more figures