Table of Contents
Fetching ...

SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Tianrun Chen, Lanyun Zhu, Chaotao Ding, Runlong Cao, Yan Wang, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang

TL;DR

The paper interrogates the limitations of Segment Anything (SAM) for challenging, low-level segmentation tasks such as camouflage and shadow detection. It introduces SAM-Adapter, a lightweight adapter approach that injects task-specific visual prompts into the frozen SAM backbone to improve downstream performance without fine-tuning SAM itself. Across camouflaged object detection, shadow detection, and polyp segmentation, SAM-Adapter achieves state-of-the-art or substantially improved results, validating the feasibility of adapting large pre-trained segmentation models to specialized domains. This work highlights a practical pathway for extending foundation models to domain-specific tasks in areas like medicine, agriculture, and remote sensing, using simple yet effective adapters and prompts.

Abstract

The emergence of large models, also known as foundation models, has brought significant advancements to AI research. One such model is Segment Anything (SAM), which is designed for image segmentation tasks. However, as with other foundation models, our experimental findings suggest that SAM may fail or perform poorly in certain segmentation tasks, such as shadow detection and camouflaged object detection (concealed object detection). This study first paves the way for applying the large pre-trained image segmentation model SAM to these downstream tasks, even in situations where SAM performs poorly. Rather than fine-tuning the SAM network, we propose \textbf{SAM-Adapter}, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters. By integrating task-specific knowledge with general knowledge learnt by the large model, SAM-Adapter can significantly elevate the performance of SAM in challenging tasks as shown in extensive experiments. We can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection, shadow detection. We also tested polyp segmentation (medical image segmentation) and achieves better results. We believe our work opens up opportunities for utilizing SAM in downstream tasks, with potential applications in various fields, including medical image processing, agriculture, remote sensing, and more.

SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

TL;DR

The paper interrogates the limitations of Segment Anything (SAM) for challenging, low-level segmentation tasks such as camouflage and shadow detection. It introduces SAM-Adapter, a lightweight adapter approach that injects task-specific visual prompts into the frozen SAM backbone to improve downstream performance without fine-tuning SAM itself. Across camouflaged object detection, shadow detection, and polyp segmentation, SAM-Adapter achieves state-of-the-art or substantially improved results, validating the feasibility of adapting large pre-trained segmentation models to specialized domains. This work highlights a practical pathway for extending foundation models to domain-specific tasks in areas like medicine, agriculture, and remote sensing, using simple yet effective adapters and prompts.

Abstract

The emergence of large models, also known as foundation models, has brought significant advancements to AI research. One such model is Segment Anything (SAM), which is designed for image segmentation tasks. However, as with other foundation models, our experimental findings suggest that SAM may fail or perform poorly in certain segmentation tasks, such as shadow detection and camouflaged object detection (concealed object detection). This study first paves the way for applying the large pre-trained image segmentation model SAM to these downstream tasks, even in situations where SAM performs poorly. Rather than fine-tuning the SAM network, we propose \textbf{SAM-Adapter}, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters. By integrating task-specific knowledge with general knowledge learnt by the large model, SAM-Adapter can significantly elevate the performance of SAM in challenging tasks as shown in extensive experiments. We can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection, shadow detection. We also tested polyp segmentation (medical image segmentation) and achieves better results. We believe our work opens up opportunities for utilizing SAM in downstream tasks, with potential applications in various fields, including medical image processing, agriculture, remote sensing, and more.
Paper Structure (15 sections, 2 equations, 7 figures, 3 tables)

This paper contains 15 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The architecture of the proposed SAM-Adapter.
  • Figure 2: The Visualization Results of Camouflaged Image Segmentation. As illustrated in the figure, the SAM failed to perceive those animals that are visually ‘hidden’/concealed in their natural surroundings. By using SAM-Adapter, our approach can significantly elevate the performance of object segmentation with SAM. The samples are from the COD-10K dataset, for other dataset, please refer to More Results section.
  • Figure 3: The Visualization Results of Camouflaged Image Segmentation with Different Prompting Approach of SAM. The difference of this evaluation approach is that we use the SAM with input point prompts sampled in a unified manner across the image (the everything mode that produce multiple masks of the SAM online demo, denoted SAM online in the figure), and no input points but a mask box with the size of the image as the prompt, denoted SAM. It can be found that in different prompting mode, SAM cannot fully identify the object. By using SAM-Adapter, our approach can significantly elevate the performance of object segmentation with SAM.
  • Figure 4: Shadow Detection with Different Prompting Approach of SAM. We use SAM with input point prompts sampled in a unified manner across the image (SAM online in the figure), and a box of a whole image (SAM in the figure). SAM cannot fully identify the shadow in different prompting modes. By using SAM-Adapter, our approach elevate the performance with SAM.
  • Figure 5: The Visualization Results of Shadow Detection. As illustrated in the figure, the SAM failed to distinguish the shadow and the background object. The SAM is used with the box prompt with the size of a whole image as the input and no input point prompts. By using SAM-adaptor, our approach can significantly elevate the performance of object segmentation with SAM.
  • ...and 2 more figures