Segment anything model 2: an application to 2D and 3D medical images
Haoyu Dong, Hanxue Gu, Yaqian Chen, Jichen Yang, Yuwen Chen, Maciej A. Mazurowski
TL;DR
This study assesses the Segment Anything Model 2 (SAM 2) for 2D and 3D medical image segmentation across 21 datasets. It introduces three evaluation settings—single-frame 2D, multi-frame 3D, and interactive multi-frame 3D—grounded in IoU over non-empty slices and explores a wide space of prompts, memory propagation, and interaction strategies. Key findings show SAM 2 matches SAM in 2D, but 3D performance hinges on propagation strategy, initial frame choice, and prompt modality, with bidirectional propagation and box prompts yielding strong results. The work offers actionable recommendations for applying SAM 2 to 3D medical imaging and outlines directions to enhance memory-based 3D segmentation and interactive prompting in clinical contexts.
Abstract
Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. The recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM to 3D images, one of the fundamental tasks in the medical imaging field. In this paper, we extensively evaluate SAM 2's ability to segment both 2D and 3D medical images by first collecting 21 medical imaging datasets, including surgical videos, common 3D modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) as well as 2D modalities such as X-ray and ultrasound. Two evaluation settings of SAM 2 are considered: (1) multi-frame 3D segmentation, where prompts are provided to one or multiple slice(s) selected from the volume, and (2) single-frame 2D segmentation, where prompts are provided to each slice. The former only applies to videos and 3D modalities, while the latter applies to all datasets. Our results show that SAM 2 exhibits similar performance as SAM under single-frame 2D segmentation, and has variable performance under multi-frame 3D segmentation depending on the choices of slices to annotate, the direction of the propagation, the predictions utilized during the propagation, etc. We believe our work enhances the understanding of SAM 2's behavior in the medical field and provides directions for future work in adapting SAM 2 to this domain. Our code is available at: https://github.com/mazurowski-lab/segment-anything2-medical-evaluation.
