Table of Contents
Fetching ...

On Efficient Variants of Segment Anything Model: A Survey

Xiaorui Sun, Jun Liu, Heng Tao Shen, Xiaofeng Zhu, Ping Hu

TL;DR

This survey analyzes the growing ecosystem of efficient Segment Anything Model (SAM) variants, detailing how lightweight backbones, distillation, quantization, pruning, and refactoring reduce latency while preserving segmentation quality. It introduces a structured taxonomy of approaches for accelerating SegAny and SegEvery tasks and provides a unified evaluation across COCO, LVIS, SGinW, and UVO to compare efficiency and accuracy. Key contributions include a comprehensive catalog of methods (from training-from-scratch to encoder-level distillation and sampler optimizations), and practical guidance for hardware-specific deployment. The findings show that carefully designed backbones (e.g., EfficientViT-SAM, NanoSAM) and sampling strategies can dramatically improve throughput on edge devices and CPUs with only modest accuracy trade-offs, guiding future research toward hybrid architectures, sparsity, and multi-domain universal segmentation.

Abstract

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as edge devices. To address this, a variety of SAM variants have been proposed to enhance efficiency while keeping accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by a detailed exploration of SAM acceleration strategies, categorized by approach, and a discussion of several future research directions. Finally, we offer a unified and extensive evaluation of these methods across various hardware, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

On Efficient Variants of Segment Anything Model: A Survey

TL;DR

This survey analyzes the growing ecosystem of efficient Segment Anything Model (SAM) variants, detailing how lightweight backbones, distillation, quantization, pruning, and refactoring reduce latency while preserving segmentation quality. It introduces a structured taxonomy of approaches for accelerating SegAny and SegEvery tasks and provides a unified evaluation across COCO, LVIS, SGinW, and UVO to compare efficiency and accuracy. Key contributions include a comprehensive catalog of methods (from training-from-scratch to encoder-level distillation and sampler optimizations), and practical guidance for hardware-specific deployment. The findings show that carefully designed backbones (e.g., EfficientViT-SAM, NanoSAM) and sampling strategies can dramatically improve throughput on edge devices and CPUs with only modest accuracy trade-offs, guiding future research toward hybrid architectures, sparsity, and multi-domain universal segmentation.

Abstract

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as edge devices. To address this, a variety of SAM variants have been proposed to enhance efficiency while keeping accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by a detailed exploration of SAM acceleration strategies, categorized by approach, and a discussion of several future research directions. Finally, we offer a unified and extensive evaluation of these methods across various hardware, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.
Paper Structure (30 sections, 11 equations, 18 figures, 11 tables)

This paper contains 30 sections, 11 equations, 18 figures, 11 tables.

Figures (18)

  • Figure 1: The architectures of (a) SAM kirillov2023segment and (b) the recent proposed SAM 2 ravi2024sam.
  • Figure 2: Illustration of the Segment Anything task (SegAny) and the Segment Everything task (SegEvery).
  • Figure 3: Taxonomy of Efficient Variants of Segment Anything Model (SAM).
  • Figure 4: The architecture of FastSAM zhao2023fast. It takes two stages to achieve segmenting anything: all-instance segmentation and prompt-guided selection. It is worth noting that the outputs of first stage are used directly as the SegEvery results.
  • Figure 5: The architecture of SqueezeSAM varadarajan2023squeezesam. It replaces the Transformer-based encoder-decoder structure with U-Net backboneronneberger2015u. Clicks from users and masks of salience objects are fed into SqueezeSAM with the input image to achieve interactive segmentation.
  • ...and 13 more figures