Table of Contents
Fetching ...

Interactive 3D Medical Image Segmentation with SAM 2

Chuyun Shen, Wenhao Li, Yuhang Shi, Xiangfeng Wang

TL;DR

The paper addresses the high annotation burden in 3D medical image segmentation by leveraging SAM 2 in a zero‑shot setting. It treats a 3D volume as a video, propagating annotations from a single 2D frame to all slices via SAM 2, and proposes a practical pipeline for this workflow. Experiments on BraTS2020 and MSD show that while SAM 2 does not match supervised methods on average, it can narrow the gap in certain organs and greatly enhances labeling efficiency through rapid 2D interactions. The work provides a foundation for using video‑trained foundation models in 3D MIS and releases open‑source code to foster further research and clinical adoption.

Abstract

Interactive medical image segmentation (IMIS) has shown significant potential in enhancing segmentation accuracy by integrating iterative feedback from medical professionals. However, the limited availability of enough 3D medical data restricts the generalization and robustness of most IMIS methods. The Segment Anything Model (SAM), though effective for 2D images, requires expensive semi-auto slice-by-slice annotations for 3D medical images. In this paper, we explore the zero-shot capabilities of SAM 2, the next-generation Meta SAM model trained on videos, for 3D medical image segmentation. By treating sequential 2D slices of 3D images as video frames, SAM 2 can fully automatically propagate annotations from a single frame to the entire 3D volume. We propose a practical pipeline for using SAM 2 in 3D medical image segmentation and present key findings highlighting its efficiency and potential for further optimization. Concretely, numerical experiments on the BraTS2020 and the medical segmentation decathlon datasets demonstrate that SAM 2 still has a gap with supervised methods but can narrow the gap in specific settings and organ types, significantly reducing the annotation burden on medical professionals. Our code will be open-sourced and available at https://github.com/Chuyun-Shen/SAM_2_Medical_3D.

Interactive 3D Medical Image Segmentation with SAM 2

TL;DR

The paper addresses the high annotation burden in 3D medical image segmentation by leveraging SAM 2 in a zero‑shot setting. It treats a 3D volume as a video, propagating annotations from a single 2D frame to all slices via SAM 2, and proposes a practical pipeline for this workflow. Experiments on BraTS2020 and MSD show that while SAM 2 does not match supervised methods on average, it can narrow the gap in certain organs and greatly enhances labeling efficiency through rapid 2D interactions. The work provides a foundation for using video‑trained foundation models in 3D MIS and releases open‑source code to foster further research and clinical adoption.

Abstract

Interactive medical image segmentation (IMIS) has shown significant potential in enhancing segmentation accuracy by integrating iterative feedback from medical professionals. However, the limited availability of enough 3D medical data restricts the generalization and robustness of most IMIS methods. The Segment Anything Model (SAM), though effective for 2D images, requires expensive semi-auto slice-by-slice annotations for 3D medical images. In this paper, we explore the zero-shot capabilities of SAM 2, the next-generation Meta SAM model trained on videos, for 3D medical image segmentation. By treating sequential 2D slices of 3D images as video frames, SAM 2 can fully automatically propagate annotations from a single frame to the entire 3D volume. We propose a practical pipeline for using SAM 2 in 3D medical image segmentation and present key findings highlighting its efficiency and potential for further optimization. Concretely, numerical experiments on the BraTS2020 and the medical segmentation decathlon datasets demonstrate that SAM 2 still has a gap with supervised methods but can narrow the gap in specific settings and organ types, significantly reducing the annotation burden on medical professionals. Our code will be open-sourced and available at https://github.com/Chuyun-Shen/SAM_2_Medical_3D.
Paper Structure (13 sections, 7 figures, 2 tables)

This paper contains 13 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Pipeline Diagram: Utilizing Sam 2 for Propagating Slice Annotations for 3D Interactive Medical Image Segmentation. The central slice first needs to be segmented by a 2D segmentation algorithm or annotated by a human expert either through manual labeling or using an interactive semi-automatic algorithm. SAM 2 inputs the mask prompt and then predicts all other slices sequentially in both directions, ultimately obtaining annotations for all slices.
  • Figure 2: Caparision with 3D interactive methods and supervised methods. the orange bars represent 3D interactive algorithms, which typically handle 3D images by resizing. The blue bars denote supervised learning algorithms, which usually process 3D images using patches. The green bars signify algorithms based on SAM 2 segmentation. In this context, "5 clicks" refers to interactively clicking on five points on the central 2D image using SAM, one point per round, to generate 2D slice annotations, which are then propagated to the 3D image. "1 mask" indicates providing SAM 2 with the ground truth mask of the central 2D image, which is then propagated to the 3D image. "Salient area" refers to results tested only on slices with more than 256 foreground points. The bidirectional arrows indicate the difference in dice score between SAM 2-based algorithms and the optimal algorithms. Chart 1 compares the dice scores of 3D interactive algorithms and SAM 2 on the BraTS2020, Spleen, and Liver datasets, while Chart 2 compares the dice scores of supervised algorithms and SAM 2 on the Spleen, Liver, Lung, and Pancreas datasets.
  • Figure 3: Dice Score Growth per Added Point of Each Round: On the BraTS2020 benchmark, we evaluated how much the average dice score improves per additional point in each round for different interactive algorithms. The interactive methods used by the four algorithms—DeepIGeoS, InterCNN, IteR-MRL, and MECCA—select 25 points in the first round on the 3D medical image, followed by 5 additional points per round. In contrast, our pipeline with SAM 2 adds one point per round.
  • Figure 4: Interactive segmentation on a slice with SAM 2.
  • Figure 5: SAM 2 with different iterative steps on Brats2020 benchmark.
  • ...and 2 more figures