Segment Anything in Medical Images and Videos: Benchmark and Deployment
Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang
TL;DR
This paper benchmarks SAM2 across 11 medical imaging modalities, including 2D, 3D, and video data, and compares it to SAM1 and MedSAM to understand domain-specific strengths and weaknesses. It demonstrates a practical transfer-learning workflow to adapt SAM2 for medical segmentation and introduces deployment tools via a 3D Slicer plugin and a Gradio API to facilitate clinician usage. The findings reveal modality-dependent gains, with SAM2 excelling in some CT/MR tasks and video-enabled 3D segmentation, while MedSAM remains superior in several 2D modalities; fine-tuning can yield large improvements in 3D segmentation. The work also emphasizes deployment practicality and suggests future directions like expanding 3D modalities, incorporating text prompts, and reducing model size for broader clinical adoption.
Abstract
Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing it to SAM1 and MedSAM. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. Furthermore, we implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation. The code has been made publicly available at \url{https://github.com/bowang-lab/MedSAM}.
