Table of Contents
Fetching ...

Foundation Models for Biomedical Image Segmentation: A Survey

Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon

TL;DR

This survey analyzes how the Segment Anything Model (SAM) can be repurposed for biomedical image segmentation, focusing on the initial six months after SAM’s introduction. It catalogs four core adaptation strategies—zero-shot evaluation, domain-specific tuning (projection/adapter/full), 3D extensions, and knowledge distillation—assessing them across 33 open datasets and multiple modalities. The study finds that SAM can achieve competitive zero-shot performance in several radiology and camera tasks but struggles with certain anatomies and highly fine-grained pathology, underscoring the need for domain adaptation, 3D integration, and metadata-aware approaches. The work highlights the practical potential of SAM to reduce labeling demands and enable rapid deployment, while outlining concrete research directions to improve robustness, interpretability, and clinical alignment.

Abstract

Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical imaging. The fundamental premise of SAM lies in its capability to segment or identify objects in images without prior knowledge of the object type or imaging modality. This approach aligns well with tasks achievable by the human visual system, though its application in non-biological vision contexts remains more theoretically challenging. A notable feature of SAM is its ability to adjust segmentation according to a specified resolution scale or area of interest, akin to semantic priming. This adaptability has spurred a wave of creativity and innovation in applying SAM to medical imaging. Our review focuses on the period from April 1, 2023, to September 30, 2023, a critical first six months post-initial publication. We examine the adaptations and integrations of SAM necessary to address longstanding clinical challenges, particularly in the context of 33 open datasets covered in our analysis. While SAM approaches or achieves state-of-the-art performance in numerous applications, it falls short in certain areas, such as segmentation of the carotid artery, adrenal glands, optic nerve, and mandible bone. Our survey delves into the innovative techniques where SAM's foundational approach excels and explores the core concepts in translating and applying these models effectively in diverse medical imaging scenarios.

Foundation Models for Biomedical Image Segmentation: A Survey

TL;DR

This survey analyzes how the Segment Anything Model (SAM) can be repurposed for biomedical image segmentation, focusing on the initial six months after SAM’s introduction. It catalogs four core adaptation strategies—zero-shot evaluation, domain-specific tuning (projection/adapter/full), 3D extensions, and knowledge distillation—assessing them across 33 open datasets and multiple modalities. The study finds that SAM can achieve competitive zero-shot performance in several radiology and camera tasks but struggles with certain anatomies and highly fine-grained pathology, underscoring the need for domain adaptation, 3D integration, and metadata-aware approaches. The work highlights the practical potential of SAM to reduce labeling demands and enable rapid deployment, while outlining concrete research directions to improve robustness, interpretability, and clinical alignment.

Abstract

Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical imaging. The fundamental premise of SAM lies in its capability to segment or identify objects in images without prior knowledge of the object type or imaging modality. This approach aligns well with tasks achievable by the human visual system, though its application in non-biological vision contexts remains more theoretically challenging. A notable feature of SAM is its ability to adjust segmentation according to a specified resolution scale or area of interest, akin to semantic priming. This adaptability has spurred a wave of creativity and innovation in applying SAM to medical imaging. Our review focuses on the period from April 1, 2023, to September 30, 2023, a critical first six months post-initial publication. We examine the adaptations and integrations of SAM necessary to address longstanding clinical challenges, particularly in the context of 33 open datasets covered in our analysis. While SAM approaches or achieves state-of-the-art performance in numerous applications, it falls short in certain areas, such as segmentation of the carotid artery, adrenal glands, optic nerve, and mandible bone. Our survey delves into the innovative techniques where SAM's foundational approach excels and explores the core concepts in translating and applying these models effectively in diverse medical imaging scenarios.
Paper Structure (30 sections, 4 figures, 7 tables)

This paper contains 30 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Evolution of SAM's adaptation in medical research from April to September 2023. The graph showcases the cumulative studies emphasizing four phases: (i) Zero-shot Evaluation, (ii) Multi-dimensional Extension, (iii) Domain-specific Tuning, and (iv) Knowledge Distillation, highlighting a growing research interest in optimizing SAM for medical image segmentation.
  • Figure 2: The distribution of medical image segmentation data by number of scans/images in the collection of public datasets in Table \ref{['tab:t2i-dataset']}. Inclusion criteria of the pie charts are: 1. the images are real world data, 2. the annotation process involved domain experts.
  • Figure 3: Application of SAM Across Medical Imaging Modalities. The figure showcases Radiology, Pathology, and Camera Imaging examples. Central components of SAM, including the Image Encoder, Mask Decoder, and Prompt Encoder, are delineated. Methods ranging from Zero-shot Evaluation to Knowledge Distillation are accentuated within tan boxes.
  • Figure 4: Decomposition of SAM Adaptation Methods in Medical Imaging. An illustrative overview of various adaptation strategies of SAM for the medical domain. The figure showcases five key methodologies: Zero-shot Evaluation, which assesses SAM's inherent ability for medical image segmentation; Adapter, Projection, and Full Tuning, which represent different degrees of model fine-tuning; 3D Extension, highlighting SAM's adaptation for volumetric data; and Knowledge Distillation, where SAM's expertise is transferred to a student model. Each method's flow from input to output, accompanied by specific components and modules, is visualized for clarity.