Table of Contents
Fetching ...

Segment Anything in Medical Images and Videos: Benchmark and Deployment

Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang

TL;DR

This paper benchmarks SAM2 across 11 medical imaging modalities, including 2D, 3D, and video data, and compares it to SAM1 and MedSAM to understand domain-specific strengths and weaknesses. It demonstrates a practical transfer-learning workflow to adapt SAM2 for medical segmentation and introduces deployment tools via a 3D Slicer plugin and a Gradio API to facilitate clinician usage. The findings reveal modality-dependent gains, with SAM2 excelling in some CT/MR tasks and video-enabled 3D segmentation, while MedSAM remains superior in several 2D modalities; fine-tuning can yield large improvements in 3D segmentation. The work also emphasizes deployment practicality and suggests future directions like expanding 3D modalities, incorporating text prompts, and reducing model size for broader clinical adoption.

Abstract

Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing it to SAM1 and MedSAM. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. Furthermore, we implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation. The code has been made publicly available at \url{https://github.com/bowang-lab/MedSAM}.

Segment Anything in Medical Images and Videos: Benchmark and Deployment

TL;DR

This paper benchmarks SAM2 across 11 medical imaging modalities, including 2D, 3D, and video data, and compares it to SAM1 and MedSAM to understand domain-specific strengths and weaknesses. It demonstrates a practical transfer-learning workflow to adapt SAM2 for medical segmentation and introduces deployment tools via a 3D Slicer plugin and a Gradio API to facilitate clinician usage. The findings reveal modality-dependent gains, with SAM2 excelling in some CT/MR tasks and video-enabled 3D segmentation, while MedSAM remains superior in several 2D modalities; fine-tuning can yield large improvements in 3D segmentation. The work also emphasizes deployment practicality and suggests future directions like expanding 3D modalities, incorporating text prompts, and reducing model size for broader clinical adoption.

Abstract

Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing it to SAM1 and MedSAM. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. Furthermore, we implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation. The code has been made publicly available at \url{https://github.com/bowang-lab/MedSAM}.
Paper Structure (10 sections, 5 figures, 5 tables)

This paper contains 10 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Datasets and evaluation protocol. We evaluate SAM2 on various 2D&3D medical images and videos. The 2D images and the bounding box prompts are directly passed to SAM2 to generate segmentation results. The 3D images and video are initialized with a bounding box prompt on the middle slice and the first frame to get the 2D masks, respectively. Then, the model propagates the 2D mask to the remaining slices/frames.
  • Figure 2: Dot and box plot of the DSC scores for 2D image segmentation on 11 modalities. The plot shows descriptive statistics with the median value represented by the horizontal solid line within the box, the lower and upper quartiles delineating the borders of the box and the vertical black lines indicating 1.5$\times$IQR.
  • Figure 3: a, Dot and box plot of the DSC scores for 3D image segmentation. The plot shows descriptive statistics with the median value represented by the horizontal solid line within the box, the lower and upper quartiles delineating the borders of the box and the vertical black lines indicating 1.5$\times$IQR. b, Visualized MR and PET segmentation examples of the best-performing SAM2. The bounding box prompt is initialized on the middle slicer and the generated mask is propagated to the top and bottom slices, respectively.
  • Figure 4: Visualized examples of the best video segmentation model. The model either failed to segment the first frame or generated over-segmentation errors during the mask propagation when the object boundary is not clear or the images have low contrast.
  • Figure 5: SAM2 deployment for medical image and video annotation. a, Slicer plugin for 3D medical image segmentation. b, Gradio API for video segmentation.