Segment anything model 2: an application to 2D and 3D medical images

Haoyu Dong; Hanxue Gu; Yaqian Chen; Jichen Yang; Yuwen Chen; Maciej A. Mazurowski

Segment anything model 2: an application to 2D and 3D medical images

Haoyu Dong, Hanxue Gu, Yaqian Chen, Jichen Yang, Yuwen Chen, Maciej A. Mazurowski

TL;DR

This study assesses the Segment Anything Model 2 (SAM 2) for 2D and 3D medical image segmentation across 21 datasets. It introduces three evaluation settings—single-frame 2D, multi-frame 3D, and interactive multi-frame 3D—grounded in IoU over non-empty slices and explores a wide space of prompts, memory propagation, and interaction strategies. Key findings show SAM 2 matches SAM in 2D, but 3D performance hinges on propagation strategy, initial frame choice, and prompt modality, with bidirectional propagation and box prompts yielding strong results. The work offers actionable recommendations for applying SAM 2 to 3D medical imaging and outlines directions to enhance memory-based 3D segmentation and interactive prompting in clinical contexts.

Abstract

Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. The recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM to 3D images, one of the fundamental tasks in the medical imaging field. In this paper, we extensively evaluate SAM 2's ability to segment both 2D and 3D medical images by first collecting 21 medical imaging datasets, including surgical videos, common 3D modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) as well as 2D modalities such as X-ray and ultrasound. Two evaluation settings of SAM 2 are considered: (1) multi-frame 3D segmentation, where prompts are provided to one or multiple slice(s) selected from the volume, and (2) single-frame 2D segmentation, where prompts are provided to each slice. The former only applies to videos and 3D modalities, while the latter applies to all datasets. Our results show that SAM 2 exhibits similar performance as SAM under single-frame 2D segmentation, and has variable performance under multi-frame 3D segmentation depending on the choices of slices to annotate, the direction of the propagation, the predictions utilized during the propagation, etc. We believe our work enhances the understanding of SAM 2's behavior in the medical field and provides directions for future work in adapting SAM 2 to this domain. Our code is available at: https://github.com/mazurowski-lab/segment-anything2-medical-evaluation.

Segment anything model 2: an application to 2D and 3D medical images

TL;DR

Abstract

Paper Structure (21 sections, 13 figures, 3 tables, 2 algorithms)

This paper contains 21 sections, 13 figures, 3 tables, 2 algorithms.

Introduction
Methods
Evaluation Criteria for Single-Frame 2D Segmentation
Evaluation Criteria for Multi-Frame 3D Segmentation
Evaluation Criteria for Interactive Multi-Frame 3D Segmentation
Dataset
2D Datasets
3D Datasets
Experimental Results
Results of SAM 2 under Singe-frame 2D Segmentation
Results of SAM 2 under Multi-frame 3D Segmentation
Impact of Propagation Mode
Impact of Predicted Mask Selection
Impact of Initial Frame Selection
Impact of Prompt Modes
...and 6 more sections

Figures (13)

Figure 1: The pipeline of evaluating SAM 2 in the 3D setting. Different modes at each stage are proposed and evaluated.
Figure 2: Examples from all 21 datasets, each overlaid with annotation masks. The top rows feature 15 examples from 3D datasets, while the bottom row presents 6 examples from 2D datasets. The human anatomy figure is from $Vecteezy.com$
Figure 3: The performance of SAM 2 under single-frame 2D segmentation. Four prompt modes are considered, with results ranked in descending order based on P-Mode 4.
Figure 4: The single-frame 2D segmentation performance of SAM under 4 prompting modes 24 segmentation tasks (in gray) and the difference between the performance of SAM 2 and SAM. The differences are highlighted in red (when SAM has a higher IoU) and green (when SAM 2 has a higher IoU)
Figure 5: The multi-frame 3D segmentation performance of SAM 2 under all mode combinations, averaged over all datasets. {F, P, S, D}-Mode stands for the frame to annotate, the prompt type, selection of the predicted masks, and direction of propagation respectively. The details of each model are shown in Figure \ref{['fig:pipeline']} and Sec. \ref{['sec:method']}.
...and 8 more figures

Segment anything model 2: an application to 2D and 3D medical images

TL;DR

Abstract

Segment anything model 2: an application to 2D and 3D medical images

Authors

TL;DR

Abstract

Table of Contents

Figures (13)