A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Chaoning Zhang; Joseph Cho; Fachrina Dewi Puspitasari; Sheng Zheng; Chenghao Li; Yu Qiao; Taegoo Kang; Xinru Shan; Chenshuang Zhang; Caiyan Qin; Francois Rameau; Lik-Hang Lee; Sung-Ho Bae; Choong Seon Hong

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Chaoning Zhang, Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

TL;DR

This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding and suggests future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms.

Abstract

The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

TL;DR

Abstract

Paper Structure (54 sections, 14 figures, 1 table)

This paper contains 54 sections, 14 figures, 1 table.

Introduction
Brief Overview
Method
Contribution
Importance of SAM in Vision Tasks
Innovations in Segmentation Granularity
Historical Techniques and Developments
Granularity in Object Recognition
SAM-powered Applications
Image-based Applications
Generation
Inpainting, Stylizing, and Restoration
Annotations
Matching
Video-based Applications
...and 39 more sections

Figures (14)

Figure 1: Overview of our work
Figure 2: Distribution of selected papers across some key venues
Figure 3: Distribution of studies citing SAM across various application domains, highlighting its versatility and broad impact in fields such as medical imaging, robotics, vision-language applications, and more.
Figure 4: SAM in image generation models. The diagram shows how SAM enhances various image generation tasks by integrating with frameworks like Concept Weaver, HiFi Tuner, and Salient Object-Aware Background Generation. SAM contributes to tasks such as multi-concept generation, character management, object reconstruction, and text-image alignment, emphasizing its versatility in addressing complex vision challenges through precise segmentation.
Figure 5: Example of SAM application in image restoration, showing how segmentation maps guide noise modulation and object-level processing for improved restoration quality.
...and 9 more figures

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

TL;DR

Abstract

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Authors

TL;DR

Abstract

Table of Contents

Figures (14)