Table of Contents
Fetching ...

Is SAM 2 Better than SAM in Medical Image Segmentation?

Sourya Sengupta, Satrajit Chakrabarty, Ravi Soni

TL;DR

This study assesses whether SAM 2 provides a true advantage over SAM for 2D medical image segmentation in a zero-shot, promptable setting. By benchmarking 24 organ-modality combinations across 11 public datasets and evaluating two prompting strategies (multiple positive prompts and combined positive/negative prompts), the authors reveal that SAM 2 generally does not outperform SAM in CT and Ultrasound, but can match or surpass SAM in MRI; negative prompts substantially boost performance for both models. The findings highlight modality-dependent differences in model capability and underscore the practical value of prompting strategies to mitigate boundary ambiguity in medical images. The work informs model selection for clinical segmentation tasks and points to avenues for extending evaluations to 3D and time-series data.

Abstract

The Segment Anything Model (SAM) has demonstrated impressive performance in zero-shot promptable segmentation on natural images. The recently released Segment Anything Model 2 (SAM 2) claims to outperform SAM on images and extends the model's capabilities to video segmentation. Evaluating the performance of this new model in medical image segmentation, specifically in a zero-shot promptable manner, is crucial. In this work, we conducted extensive studies using multiple datasets from various imaging modalities to compare the performance of SAM and SAM 2. We employed two point-prompt strategies: (i) multiple positive prompts where one prompt is placed near the centroid of the target structure, while the remaining prompts are randomly placed within the structure, and (ii) combined positive and negative prompts where one positive prompt is placed near the centroid of the target structure, and two negative prompts are positioned outside the structure, maximizing the distance from the positive prompt and from each other. The evaluation encompassed 24 unique organ-modality combinations, including abdominal structures, cardiac structures, fetal head images, skin lesions and polyp images across 11 publicly available MRI, CT, ultrasound, dermoscopy, and endoscopy datasets. Preliminary results based on 2D images indicate that while SAM 2 may perform slightly better in a few cases, it does not generally surpass SAM for medical image segmentation. Notably, SAM 2 performs worse than SAM in lower contrast imaging modalities, such as CT and ultrasound. However, for MRI images, SAM 2 performs on par with or better than SAM. Like SAM, SAM 2 also suffers from over-segmentation issues, particularly when the boundaries of the target organ are fuzzy.

Is SAM 2 Better than SAM in Medical Image Segmentation?

TL;DR

This study assesses whether SAM 2 provides a true advantage over SAM for 2D medical image segmentation in a zero-shot, promptable setting. By benchmarking 24 organ-modality combinations across 11 public datasets and evaluating two prompting strategies (multiple positive prompts and combined positive/negative prompts), the authors reveal that SAM 2 generally does not outperform SAM in CT and Ultrasound, but can match or surpass SAM in MRI; negative prompts substantially boost performance for both models. The findings highlight modality-dependent differences in model capability and underscore the practical value of prompting strategies to mitigate boundary ambiguity in medical images. The work informs model selection for clinical segmentation tasks and points to avenues for extending evaluations to 3D and time-series data.

Abstract

The Segment Anything Model (SAM) has demonstrated impressive performance in zero-shot promptable segmentation on natural images. The recently released Segment Anything Model 2 (SAM 2) claims to outperform SAM on images and extends the model's capabilities to video segmentation. Evaluating the performance of this new model in medical image segmentation, specifically in a zero-shot promptable manner, is crucial. In this work, we conducted extensive studies using multiple datasets from various imaging modalities to compare the performance of SAM and SAM 2. We employed two point-prompt strategies: (i) multiple positive prompts where one prompt is placed near the centroid of the target structure, while the remaining prompts are randomly placed within the structure, and (ii) combined positive and negative prompts where one positive prompt is placed near the centroid of the target structure, and two negative prompts are positioned outside the structure, maximizing the distance from the positive prompt and from each other. The evaluation encompassed 24 unique organ-modality combinations, including abdominal structures, cardiac structures, fetal head images, skin lesions and polyp images across 11 publicly available MRI, CT, ultrasound, dermoscopy, and endoscopy datasets. Preliminary results based on 2D images indicate that while SAM 2 may perform slightly better in a few cases, it does not generally surpass SAM for medical image segmentation. Notably, SAM 2 performs worse than SAM in lower contrast imaging modalities, such as CT and ultrasound. However, for MRI images, SAM 2 performs on par with or better than SAM. Like SAM, SAM 2 also suffers from over-segmentation issues, particularly when the boundaries of the target organ are fuzzy.
Paper Structure (10 sections, 4 figures, 1 table)

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Performance trends of SAM and SAM 2 in terms of Dice Similarity Coefficient (DSC) with 1, 3, 5, 7, 9, and 10 positive prompts.
  • Figure 2: Qualitative comparison of SAM vs. SAM 2 performance along with the Dice Similarity Coefficient (DSC) values for different structures across all modalities. All results shown are for a single positive prompt (denoted by x in the images).
  • Figure 3: Quantitative comparison of SAM vs. SAM 2 performance in terms of Dice Similarity Coefficient (DSC) per structure across datasets. For both models, performance is shown for (1 positive, 0 negative) prompt and (1 positive, 2 negative) prompts.
  • Figure 4: Qualitative comparison of SAM vs. SAM 2 performance along with the Dice Similarity Coefficient (DSC) values for four structures, one each from CT, MRI, Ultrasound and Endoscopy modalities. For each subplot, the top and bottom rows show results for (1 positive, 0 negative) prompt and (1 positive, 2 negative) prompts respectively. Positive and negative prompts are respectively denoted by x and x in the images.