Table of Contents
Fetching ...

SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation

Tianrun Chen, Runlong Cao, Xinda Yu, Lanyun Zhu, Chaotao Ding, Deyi Ji, Cheng Chen, Qi Zhu, Chunyan Xu, Papa Mao, Ying Zang

TL;DR

This paper addresses the gap in fine-grained segmentation where large foundation models struggle, by pairing the Segment Anything 3 (SAM3) backbone with SAM3-Adapter, a lightweight, per-stage adapter system. The adapters generate task-specific prompts $P^i$ from stage inputs $F_i$ (composed of signals like $F_{pe}$ and $F_{hfc}$) via $P^i = { m MLP}{up}(\text{GELU}({\rm MLP}{tune}^i(F_i)))$ and inject them into the transformer layers to tailor segmentation to each domain. The approach delivers state-of-the-art performance across camouflaged object detection ($S_\alpha$, $E_\phi$, MAE), shadow detection (BER), polyp segmentation ($mDice$, $mIoU$), and cell segmentation (F1), while maintaining efficiency by freezing the SAM3 encoder and reusing a compact adapter. Extensive experiments on COD10K, CAMO, CHAMELEON, ISTD, Kvasir-SEG, and NeurIPS 2022 Cell Segmentation demonstrate robust gains and generalizability, supported by open-source code and data processing pipelines. This work proves that scaling SAM3, when combined with intelligent adapters, yields substantial, practical gains for domain-specific segmentation tasks.

Abstract

The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse vision tasks. However, previous generations-including SAM and its successor-still struggle with fine-grained, low-level segmentation challenges such as camouflaged object detection, medical image segmentation, cell image segmentation, and shadow detection. To address these limitations, we originally proposed SAM-Adapter in 2023, demonstrating substantial gains on these difficult scenarios. With the emergence of Segment Anything 3 (SAM3)-a more efficient and higher-performing evolution with a redesigned architecture and improved training pipeline-we revisit these long-standing challenges. In this work, we present SAM3-Adapter, the first adapter framework tailored for SAM3 that unlocks its full segmentation capability. SAM3-Adapter not only reduces computational overhead but also consistently surpasses both SAM and SAM2-based solutions, establishing new state-of-the-art results across multiple downstream tasks, including medical imaging, camouflaged (concealed) object segmentation, and shadow detection. Built upon the modular and composable design philosophy of the original SAM-Adapter, SAM3-Adapter provides stronger generalizability, richer task adaptability, and significantly improved segmentation precision. Extensive experiments confirm that integrating SAM3 with our adapter yields superior accuracy, robustness, and efficiency compared to all prior SAM-based adaptations. We hope SAM3-Adapter can serve as a foundation for future research and practical segmentation applications. Code, pre-trained models, and data processing pipelines are available.

SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation

TL;DR

This paper addresses the gap in fine-grained segmentation where large foundation models struggle, by pairing the Segment Anything 3 (SAM3) backbone with SAM3-Adapter, a lightweight, per-stage adapter system. The adapters generate task-specific prompts from stage inputs (composed of signals like and ) via and inject them into the transformer layers to tailor segmentation to each domain. The approach delivers state-of-the-art performance across camouflaged object detection (, , MAE), shadow detection (BER), polyp segmentation (, ), and cell segmentation (F1), while maintaining efficiency by freezing the SAM3 encoder and reusing a compact adapter. Extensive experiments on COD10K, CAMO, CHAMELEON, ISTD, Kvasir-SEG, and NeurIPS 2022 Cell Segmentation demonstrate robust gains and generalizability, supported by open-source code and data processing pipelines. This work proves that scaling SAM3, when combined with intelligent adapters, yields substantial, practical gains for domain-specific segmentation tasks.

Abstract

The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse vision tasks. However, previous generations-including SAM and its successor-still struggle with fine-grained, low-level segmentation challenges such as camouflaged object detection, medical image segmentation, cell image segmentation, and shadow detection. To address these limitations, we originally proposed SAM-Adapter in 2023, demonstrating substantial gains on these difficult scenarios. With the emergence of Segment Anything 3 (SAM3)-a more efficient and higher-performing evolution with a redesigned architecture and improved training pipeline-we revisit these long-standing challenges. In this work, we present SAM3-Adapter, the first adapter framework tailored for SAM3 that unlocks its full segmentation capability. SAM3-Adapter not only reduces computational overhead but also consistently surpasses both SAM and SAM2-based solutions, establishing new state-of-the-art results across multiple downstream tasks, including medical imaging, camouflaged (concealed) object segmentation, and shadow detection. Built upon the modular and composable design philosophy of the original SAM-Adapter, SAM3-Adapter provides stronger generalizability, richer task adaptability, and significantly improved segmentation precision. Extensive experiments confirm that integrating SAM3 with our adapter yields superior accuracy, robustness, and efficiency compared to all prior SAM-based adaptations. We hope SAM3-Adapter can serve as a foundation for future research and practical segmentation applications. Code, pre-trained models, and data processing pipelines are available.

Paper Structure

This paper contains 14 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The architecture of the proposed SAM3-Adapter.
  • Figure 2: Segmentation performance visualization on CHAMELEON samples. The figure highlights the limitations of SAM, SAM2, and SAM3 in handling severe camouflage, where they produce non-meaningful outcomes. Although SAM-Adapter enhances segmentation quality, our SAM3-Adapter delivers the best performance, generating precise masks that closely match the ground truth compared to its predecessors.
  • Figure 3: Camouflaged image segmentation on the COD-10K dataset. Examples from the COD-10K dataset illustrating animals that are strongly camouflaged within their natural backgrounds. The original SAM frequently fails to accurately localize these targets and can produce fragmented or semantically incoherent segmentations; SAM2 and SAM3 exhibits similar limitations, occasionally producing no mask or incorrect outputs. With the integration of SAM3-Adapter, segmentation reliability on these challenging instances is substantially improved, achieving clear gains over earlier SAM2-Adapter variants.
  • Figure 4: Camouflaged examples from the CAMO dataset. The original SAM, SAM2 struggle to perceive animals that are visually concealed within their natural surroundings. SAM3, however, have already gained the capability of distinguish the camouflaged object as we can observe. Integrating SAM3-Adapter further improves the model's ability to segment the concealed targets.
  • Figure 5: Visualization of Shadow Detection results. SAM and SAM2 fails to identify shadows. Standalone SAM3 demonstrates a foundational ability to identify shadows, but struggles with precise boundaries. Our SAM3-Adapter unlocks SAM3's full potential, transforming its initial perception into state-of-the-art segmentation masks with sharp, accurate contours.
  • ...and 1 more figures