Table of Contents
Fetching ...

Segment Anything Model for Medical Image Analysis: an Experimental Study

Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang

TL;DR

This study evaluates the Segment Anything Model (SAM) on diverse medical imaging datasets to quantify its zero-shot segmentation capabilities and how prompting strategies affect performance. It systematically tests five non-iterative prompting modes, iterative prompting, and the segment-everything option across 28 tasks, comparing against leading interactive segmentation methods. Key findings show box prompts, especially when targeting separate object parts, yield the best average performance, while iterative prompts offer limited improvements for SAM; performance varies widely by dataset and object characteristics. The work provides practical guidance on prompting strategies for medical image segmentation and outlines avenues for adapting SAM to medical contexts and 3D data in future work.

Abstract

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model that is intended to segment user-defined objects of interest in an interactive manner. While the performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it.

Segment Anything Model for Medical Image Analysis: an Experimental Study

TL;DR

This study evaluates the Segment Anything Model (SAM) on diverse medical imaging datasets to quantify its zero-shot segmentation capabilities and how prompting strategies affect performance. It systematically tests five non-iterative prompting modes, iterative prompting, and the segment-everything option across 28 tasks, comparing against leading interactive segmentation methods. Key findings show box prompts, especially when targeting separate object parts, yield the best average performance, while iterative prompts offer limited improvements for SAM; performance varies widely by dataset and object characteristics. The work provides practical guidance on prompting strategies for medical image segmentation and outlines avenues for adapting SAM to medical contexts and 3D data in future work.

Abstract

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model that is intended to segment user-defined objects of interest in an interactive manner. While the performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it.
Paper Structure (17 sections, 17 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 17 figures, 2 tables, 1 algorithm.

Figures (17)

  • Figure 1: Examples of prompt(s) generated by the five modes respectively. Green contours show the ground-truth masks, and blue star(s) and box(es) indicate the prompts.
  • Figure 2: Performance of SAM under 5 modes of use. Left: Performance of SAM across 28 segmentation tasks, with results ranked in descending order based on Mode 4. Oracle performance for each mode is indicated by the inverted triangle. Right: A summarized performance comparison of all five modes across all tasks, presented in a box and whisker plot format.
  • Figure 3: Visualization of SAM's segmentation results in two different modes. Each dataset is shown in two sequential rows, with its name along the left side. For each dataset, it displays three examples from left to right, reflecting the 25th, 50th, and 75th percentiles of IoU across all images for that dataset. For each example, we visualize (top left) the raw image; (bottom left) the zoom-in image with the area of interest; (top right) the segmented results for mode 2: 1 point at each object region; (bottom right) the segmented results for mode 4: 1 box region at each object region. Additionally, the IoU is represented above each segmented result. Examples of all the datasets are shown in Appendix Figure 1-5.
  • Figure 4: Comparison of SAM with three other competing methods, namely RITM, SimpleClick, and Focalclick, under the 1-point prompt setting. The results are presented in the form of the difference between SAM and other methods ($\Delta$ IoU), and ranked based on the descending order of the largest $\Delta$ IoU for each task.
  • Figure 5: Comparison of SAM and other methods under an interactive prompt setting. (Left) it presents the average performance of SAM and other methods across all tasks with respect to the number of prompt changes. (Right) it shows the detailed performance of SAM over each task.
  • ...and 12 more figures