Biomedical SAM 2: Segment Anything in Biomedical Images and Videos
Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun
TL;DR
The paper evaluates SAM-2 in medical imaging, identifies a domain gap that hinders zero-shot medical segmentation, and introduces BioSAM-2, a memory-enabled, domain-adapted version that freezes the prompt encoder while fine-tuning the image encoder and mask decoder. Through three dedicated biomedical pipelines and extensive experiments on 8 modalities and 22 targets, BioSAM-2 consistently surpasses state-of-the-art foundation models and rivals specialized medical models. The results demonstrate BioSAM-2’s strong generalization across 2D and 3D image tasks and video segmentation, indicating a promising direction for versatile, clinically useful biomedical AI tools. The work highlights the value of memory mechanisms and targeted fine-tuning for domain-specific segmentation and points to future integration with clinical workflows to improve annotation efficiency and diagnostic accuracy.
Abstract
Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D image segmentation, multi-frame 3D image segmentation and multi-frame video segmentation with varied prompt designs, revealing SAM-2's limitations in medical contexts. Consequently, we developed BioSAM-2, an enhanced foundation model optimized for biomedical data based on SAM-2. Our experiments show that BioSAM-2 not only surpasses the performance of existing state-of-the-art foundation models but also matches or even exceeds specialist models, demonstrating its efficacy and potential in the medical domain.
