Table of Contents
Fetching ...

Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey

Yichi Zhang, Zhenrong Shen

TL;DR

The paper surveys SAM2’s applicability to biomedical images and videos, highlighting how its streaming, memory-based segmentation can reduce annotation needs and enable zero-shot results while revealing substantial performance variability across modalities. It synthesizes architecture details of SAM and SAM2, evaluates 3D medical data treated as video-like slices, and reviews multiple adaptation efforts (BioSAM2, MedSAM-2, SAM2-PATH, SurgSAM-2). Key findings show SAM2 can approach or exceed supervised methods in certain medical videos but often lags on heterogeneous 3D scans and static modalities, underscoring the importance of adaptation and prompting strategies. The work emphasizes potential clinical impact and provides a public repository to support ongoing work in biomedical SAM2 applications.

Abstract

The unprecedented developments in segmentation foundational models have become a dominant force in the field of computer vision, introducing a multitude of previously unexplored capabilities in a wide range of natural images and videos. Specifically, the Segment Anything Model (SAM) signifies a noteworthy expansion of the prompt-driven paradigm into the domain of image segmentation. The recent introduction of SAM2 effectively extends the original SAM to a streaming fashion and demonstrates strong performance in video segmentation. However, due to the substantial distinctions between natural and medical images, the effectiveness of these models on biomedical images and videos is still under exploration. This paper presents an overview of recent efforts in applying and adapting SAM2 to biomedical images and videos. The findings indicate that while SAM2 shows promise in reducing annotation burdens and enabling zero-shot segmentation, its performance varies across different datasets and tasks. Addressing the domain gap between natural and medical images through adaptation and fine-tuning is essential to fully unleash SAM2's potential in clinical applications. To support ongoing research endeavors, we maintain an active repository that contains up-to-date SAM & SAM2-related papers and projects at https://github.com/YichiZhang98/SAM4MIS.

Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey

TL;DR

The paper surveys SAM2’s applicability to biomedical images and videos, highlighting how its streaming, memory-based segmentation can reduce annotation needs and enable zero-shot results while revealing substantial performance variability across modalities. It synthesizes architecture details of SAM and SAM2, evaluates 3D medical data treated as video-like slices, and reviews multiple adaptation efforts (BioSAM2, MedSAM-2, SAM2-PATH, SurgSAM-2). Key findings show SAM2 can approach or exceed supervised methods in certain medical videos but often lags on heterogeneous 3D scans and static modalities, underscoring the importance of adaptation and prompting strategies. The work emphasizes potential clinical impact and provides a public repository to support ongoing work in biomedical SAM2 applications.

Abstract

The unprecedented developments in segmentation foundational models have become a dominant force in the field of computer vision, introducing a multitude of previously unexplored capabilities in a wide range of natural images and videos. Specifically, the Segment Anything Model (SAM) signifies a noteworthy expansion of the prompt-driven paradigm into the domain of image segmentation. The recent introduction of SAM2 effectively extends the original SAM to a streaming fashion and demonstrates strong performance in video segmentation. However, due to the substantial distinctions between natural and medical images, the effectiveness of these models on biomedical images and videos is still under exploration. This paper presents an overview of recent efforts in applying and adapting SAM2 to biomedical images and videos. The findings indicate that while SAM2 shows promise in reducing annotation burdens and enabling zero-shot segmentation, its performance varies across different datasets and tasks. Addressing the domain gap between natural and medical images through adaptation and fine-tuning is essential to fully unleash SAM2's potential in clinical applications. To support ongoing research endeavors, we maintain an active repository that contains up-to-date SAM & SAM2-related papers and projects at https://github.com/YichiZhang98/SAM4MIS.
Paper Structure (8 sections, 1 figure)

This paper contains 8 sections, 1 figure.

Figures (1)

  • Figure 1: Overview of the SAM2 architecture and workflow for the segmentation task of biomedical images and videos.