Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

Zhiling Yan; Weixiang Sun; Rong Zhou; Zhengqing Yuan; Kai Zhang; Yiwei Li; Tianming Liu; Quanzheng Li; Xiang Li; Lifang He; Lichao Sun

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

TL;DR

The paper evaluates SAM-2 in medical imaging, identifies a domain gap that hinders zero-shot medical segmentation, and introduces BioSAM-2, a memory-enabled, domain-adapted version that freezes the prompt encoder while fine-tuning the image encoder and mask decoder. Through three dedicated biomedical pipelines and extensive experiments on 8 modalities and 22 targets, BioSAM-2 consistently surpasses state-of-the-art foundation models and rivals specialized medical models. The results demonstrate BioSAM-2’s strong generalization across 2D and 3D image tasks and video segmentation, indicating a promising direction for versatile, clinically useful biomedical AI tools. The work highlights the value of memory mechanisms and targeted fine-tuning for domain-specific segmentation and points to future integration with clinical workflows to improve annotation efficiency and diagnostic accuracy.

Abstract

Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D image segmentation, multi-frame 3D image segmentation and multi-frame video segmentation with varied prompt designs, revealing SAM-2's limitations in medical contexts. Consequently, we developed BioSAM-2, an enhanced foundation model optimized for biomedical data based on SAM-2. Our experiments show that BioSAM-2 not only surpasses the performance of existing state-of-the-art foundation models but also matches or even exceeds specialist models, demonstrating its efficacy and potential in the medical domain.

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 4 figures, 4 tables)

This paper contains 16 sections, 4 equations, 4 figures, 4 tables.

Introduction
Related Work
Method
Preliminary Study of SAM-2
Medical Applications of SAM-2
BioSAM-2: Dedicated biomedical segmentation foundation model
Experiments
Biomedical Image Segmentation
Datasets
Experimental Setup
Results
Biomedical Video Segmentation
Datasets
Experimental Setup
Results
...and 1 more sections

Figures (4)

Figure 2: Image segmentation results of tiny SAM-2 and large SAM-2 based on different segmentation prompts.
Figure 3: The workflow of proposed BioSAM-2. We adapt SAM-2 for medical image and video segmentation by freezing the prompt encoder and only finetuning the image encoder and mask decoder.
Figure 4: The visualization results of SAM-2 in 3-click applications on two medical scenarios.
Figure 5: The failure example of the SAM-2 in 1-click applications on low-resolution ultrasound videos.

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

TL;DR

Abstract

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (4)