Table of Contents
Fetching ...

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Chaoning Zhang, Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

TL;DR

This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding and suggests future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms.

Abstract

The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

TL;DR

This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding and suggests future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms.

Abstract

The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.
Paper Structure (54 sections, 14 figures, 1 table)

This paper contains 54 sections, 14 figures, 1 table.

Figures (14)

  • Figure 1: Overview of our work
  • Figure 2: Distribution of selected papers across some key venues
  • Figure 3: Distribution of studies citing SAM across various application domains, highlighting its versatility and broad impact in fields such as medical imaging, robotics, vision-language applications, and more.
  • Figure 4: SAM in image generation models. The diagram shows how SAM enhances various image generation tasks by integrating with frameworks like Concept Weaver, HiFi Tuner, and Salient Object-Aware Background Generation. SAM contributes to tasks such as multi-concept generation, character management, object reconstruction, and text-image alignment, emphasizing its versatility in addressing complex vision challenges through precise segmentation.
  • Figure 5: Example of SAM application in image restoration, showing how segmentation maps guide noise modulation and object-level processing for improved restoration quality.
  • ...and 9 more figures