Table of Contents
Fetching ...

SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li

TL;DR

The paper introduces SAM2-UNet, a simple U-shaped segmentation framework that uses the SAM2-based Hiera encoder with adapters for parameter-efficient fine-tuning and a classic decoder. It demonstrates strong cross-domain performance across natural and medical segmentation tasks on eighteen datasets and five benchmarks, outperforming specialized state-of-the-art methods. Key contributions include a three-component design (Hiera encoder, RFBs/adapters, U-Net decoder) and a weighted IoU+BCE loss with deep supervision. The results suggest SAM2-UNet as a robust, scalable baseline for adapting vision foundation models to diverse segmentation tasks.

Abstract

Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts the Hiera backbone of SAM2 as the encoder, while the decoder uses the classic U-shaped design. Additionally, adapters are inserted into the encoder to allow parameter-efficient fine-tuning. Preliminary experiments on various downstream tasks, such as camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation, demonstrate that our SAM2-UNet can simply beat existing specialized state-of-the-art methods without bells and whistles. Project page: \url{https://github.com/WZH0120/SAM2-UNet}.

SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

TL;DR

The paper introduces SAM2-UNet, a simple U-shaped segmentation framework that uses the SAM2-based Hiera encoder with adapters for parameter-efficient fine-tuning and a classic decoder. It demonstrates strong cross-domain performance across natural and medical segmentation tasks on eighteen datasets and five benchmarks, outperforming specialized state-of-the-art methods. Key contributions include a three-component design (Hiera encoder, RFBs/adapters, U-Net decoder) and a weighted IoU+BCE loss with deep supervision. The results suggest SAM2-UNet as a robust, scalable baseline for adapting vision foundation models to diverse segmentation tasks.

Abstract

Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts the Hiera backbone of SAM2 as the encoder, while the decoder uses the classic U-shaped design. Additionally, adapters are inserted into the encoder to allow parameter-efficient fine-tuning. Preliminary experiments on various downstream tasks, such as camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation, demonstrate that our SAM2-UNet can simply beat existing specialized state-of-the-art methods without bells and whistles. Project page: \url{https://github.com/WZH0120/SAM2-UNet}.
Paper Structure (8 sections, 3 figures, 9 tables)

This paper contains 8 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Overview of the proposed SAM2-UNet. Note that there are some variants of the Hiera block, and we only demonstrate a simplified structure for ease of understanding.
  • Figure 2: Visualization results on camouflaged object detection.
  • Figure 3: Visualization results on polyp segmentation.