Table of Contents
Fetching ...

Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model

Yizhe Zhang, Tao Zhou, Shuo Wang, Peixian Liang, Danny Z. Chen

TL;DR

The paper tackles the challenge of leveraging a general segmentation foundation model (SAM) for medical image segmentation. It introduces SAMAug, which uses SAM-generated segmentation and boundary priors to augment input images as additional channels, enabling downstream models to better exploit semantic structure without fine-tuning SAM. Training remains lightweight, with a simple fusion module and flexible loss blending, and deployment can utilize augmented inputs or ensemble strategies. Across polyp, MoNuSeg, and GlaS tasks, SAMAug improves performance for both CNN and Transformer-based architectures, demonstrating the practical potential of foundation-model priors in clinical imaging.

Abstract

The Segment Anything Model (SAM) is a recently developed large model for general-purpose segmentation for computer vision tasks. SAM was trained using 11 million images with over 1 billion masks and can produce segmentation results for a wide range of objects in natural scene images. SAM can be viewed as a general perception model for segmentation (partitioning images into semantically meaningful regions). Thus, how to utilize such a large foundation model for medical image segmentation is an emerging research target. This paper shows that although SAM does not immediately give high-quality segmentation for medical image data, its generated masks, features, and stability scores are useful for building and training better medical image segmentation models. In particular, we demonstrate how to use SAM to augment image input for commonly-used medical image segmentation models (e.g., U-Net). Experiments on three segmentation tasks show the effectiveness of our proposed SAMAug method. The code is available at \url{https://github.com/yizhezhang2000/SAMAug}.

Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model

TL;DR

The paper tackles the challenge of leveraging a general segmentation foundation model (SAM) for medical image segmentation. It introduces SAMAug, which uses SAM-generated segmentation and boundary priors to augment input images as additional channels, enabling downstream models to better exploit semantic structure without fine-tuning SAM. Training remains lightweight, with a simple fusion module and flexible loss blending, and deployment can utilize augmented inputs or ensemble strategies. Across polyp, MoNuSeg, and GlaS tasks, SAMAug improves performance for both CNN and Transformer-based architectures, demonstrating the practical potential of foundation-model priors in clinical imaging.

Abstract

The Segment Anything Model (SAM) is a recently developed large model for general-purpose segmentation for computer vision tasks. SAM was trained using 11 million images with over 1 billion masks and can produce segmentation results for a wide range of objects in natural scene images. SAM can be viewed as a general perception model for segmentation (partitioning images into semantically meaningful regions). Thus, how to utilize such a large foundation model for medical image segmentation is an emerging research target. This paper shows that although SAM does not immediately give high-quality segmentation for medical image data, its generated masks, features, and stability scores are useful for building and training better medical image segmentation models. In particular, we demonstrate how to use SAM to augment image input for commonly-used medical image segmentation models (e.g., U-Net). Experiments on three segmentation tasks show the effectiveness of our proposed SAMAug method. The code is available at \url{https://github.com/yizhezhang2000/SAMAug}.
Paper Structure (13 sections, 6 equations, 5 figures, 2 tables)

This paper contains 13 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Input augmentation with SAM for boosting medical image segmentation.
  • Figure 2: Visual examples of a raw input image, its segmentation prior map by SAM, boundary prior map by SAM, and SAM-augmented image input (illustrated in Fig. \ref{['fig:workflow']}). The image sample is from the MonuSeg dataset kumar2017dataset.
  • Figure 3: Polyp segmentation results of the vanilla HSNet and SAMAug-enhanced HSNet.
  • Figure 4: Visual result comparisons of the vanilla HSNet and SAMAug-enhanced HSNet in polyp segmentation.
  • Figure 5: Visual comparisons of segmentation results on the MoNuSeg dataset.