A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu
TL;DR
The paper surveys the Segment Anything Model (SAM) as a foundational, promptable segmentation framework, tracing its origins, data (SA-1B), and concurrent developments while examining its broad applications across image processing, medical imaging, video, and multimodal tasks. It emphasizes SAM’s potential as a unifying platform for zero-shot segmentation and outlines its advantages, limitations, and necessary future directions, including adaptation to 3D, non-Euclidean domains, and robust, explainable AI. By aggregating a wide range of SAM-based extensions and open-source efforts, the work provides a consolidated view of the current landscape and practical guidance for researchers to design versatile foundation models. The survey also highlights the importance of continuous updates as the field rapidly evolves, with open datasets and toolchains accelerating research and deployment. Overall, the paper frames SAM as a pivotal step toward more general, adaptable vision systems and broader AGI-enabled capabilities.
Abstract
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence similar to that of a human being. This is in contrast to narrow or specialized AI, which is designed to perform specific tasks with a high degree of efficiency. Therefore, it is urgent to design a general class of models, which we term foundation models, trained on broad data that can be adapted to various downstream tasks. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation, greatly promoting the development of foundation models for computer vision. To fully comprehend SAM, we conduct a survey study. As the first to comprehensively review the progress of segmenting anything task for vision and beyond based on the foundation model of SAM, this work focuses on its applications to various tasks and data types by discussing its historical development, recent progress, and profound impact on broad applications. We first introduce the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM that are significant for segmenting anything task. Then, we analyze and summarize the advantages and limitations of SAM across various image processing applications, including software scenes, real-world scenes, and complex scenes. Importantly, many insights are drawn to guide future research to develop more versatile foundation models and improve the architecture of SAM. We also summarize massive other amazing applications of SAM in vision and beyond. Finally, we maintain a continuously updated paper list and an open-source project summary for foundation model SAM at \href{https://github.com/liliu-avril/Awesome-Segment-Anything}{\color{magenta}{here}}.
