Table of Contents
Fetching ...

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li

TL;DR

The proposed APL-SAM framework significantly outperforms the original SAM, achieving over a 30% improvement in terms of Dice Similarity Coefficient with only one-shot guidance, and surpasses state-of-the-art few-shot segmentation methods and even fully supervised approaches in performance.

Abstract

The Segment Anything Model (SAM) has demonstrated strong performance in image segmentation of natural scene images. However, its effectiveness diminishes markedly when applied to specific scientific domains, such as Scanning Probe Microscope (SPM) images. This decline in accuracy can be attributed to the distinct data distribution and limited availability of the data inherent in the scientific images. On the other hand, the acquisition of adequate SPM datasets is both time-intensive and laborious as well as skill-dependent. To address these challenges, we propose an Adaptive Prompt Learning with SAM (APL-SAM) framework tailored for few-shot SPM image segmentation. Our approach incorporates two key innovations to enhance SAM: 1) An Adaptive Prompt Learning module leverages few-shot embeddings derived from limited support set to learn adaptively central representatives, serving as visual prompts. This innovation eliminates the need for time-consuming online user interactions for providing prompts, such as exhaustively marking points and bounding boxes slice by slice; 2) A multi-source, multi-level mask decoder specifically designed for few-shot SPM image segmentation is introduced, which can effectively capture the correspondence between the support and query images. To facilitate comprehensive training and evaluation, we introduce a new dataset, SPM-Seg, curated for SPM image segmentation. Extensive experiments on this dataset reveal that the proposed APL-SAM framework significantly outperforms the original SAM, achieving over a 30% improvement in terms of Dice Similarity Coefficient with only one-shot guidance. Moreover, APL-SAM surpasses state-of-the-art few-shot segmentation methods and even fully supervised approaches in performance. Code and dataset used in this study will be made available upon acceptance.

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

TL;DR

The proposed APL-SAM framework significantly outperforms the original SAM, achieving over a 30% improvement in terms of Dice Similarity Coefficient with only one-shot guidance, and surpasses state-of-the-art few-shot segmentation methods and even fully supervised approaches in performance.

Abstract

The Segment Anything Model (SAM) has demonstrated strong performance in image segmentation of natural scene images. However, its effectiveness diminishes markedly when applied to specific scientific domains, such as Scanning Probe Microscope (SPM) images. This decline in accuracy can be attributed to the distinct data distribution and limited availability of the data inherent in the scientific images. On the other hand, the acquisition of adequate SPM datasets is both time-intensive and laborious as well as skill-dependent. To address these challenges, we propose an Adaptive Prompt Learning with SAM (APL-SAM) framework tailored for few-shot SPM image segmentation. Our approach incorporates two key innovations to enhance SAM: 1) An Adaptive Prompt Learning module leverages few-shot embeddings derived from limited support set to learn adaptively central representatives, serving as visual prompts. This innovation eliminates the need for time-consuming online user interactions for providing prompts, such as exhaustively marking points and bounding boxes slice by slice; 2) A multi-source, multi-level mask decoder specifically designed for few-shot SPM image segmentation is introduced, which can effectively capture the correspondence between the support and query images. To facilitate comprehensive training and evaluation, we introduce a new dataset, SPM-Seg, curated for SPM image segmentation. Extensive experiments on this dataset reveal that the proposed APL-SAM framework significantly outperforms the original SAM, achieving over a 30% improvement in terms of Dice Similarity Coefficient with only one-shot guidance. Moreover, APL-SAM surpasses state-of-the-art few-shot segmentation methods and even fully supervised approaches in performance. Code and dataset used in this study will be made available upon acceptance.

Paper Structure

This paper contains 21 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The segmentation results produced by the original Segment Anything Model (SAM) on SPM-Seg dataset. (a) represents input images; (b) is corresponding ground truths; and (c)-(f) illustrate the prediction results of the original SAM when using 1-point prompt, 5-point prompts, 10-point prompts and 20-point prompts, respectively (here, $LAGP$ is $Li_{1.5}Al_{10.5}Ge_{1.5}(PO_{4})_{3}$ solid electrolyte, $LiFeO_{4}$, $LiCoO_{2}$ and $LiMn_{2}O_{4}$ are electrode materials for Li-ion battery, and $BiOCl$ is a crystal powder). Here, the scan size for sample $LiFeO_{4}$, sample $LAGP$ and sample $LiCoO_{2}$ is $5\mu m \times5\mu m$, and the scan size for sample $LiMn_{2}O_{4}$ and sample $BiOCl$ are $3\mu m \times3\mu m$ and $2\mu m \times2\mu m$, respectively.
  • Figure 2: Examples of the one-way one-shot segmentation tasks. For each material class, an episode(task) consists of a support set $S$ (denoted by the dashed box) and a query image $I_{Q}$. Each support set includes one support image $I_{S}$ along with its corresponding mask $M_{S}$ (one-shot). In each episode, only a single material class is targeted for segmentation (one-way). Here, the scan size for both $LiFeO_{4}$ and $LAGP$ is $5\mu m \times5\mu m$.
  • Figure 3: Overview of the APL-SAM architecture proposed in this study. The image encoder of SAM remains frozen, while a series of lightweight adapters are introduced to adapt to the SPM images. APL-based prompt encoder and the multi-level mask decoder are tuned during training.
  • Figure 4: Architecture of the Adaptive Prompt Learning based prompt encoder proposed in this study.
  • Figure 5: Structure of the multi-level mask decoder (MLMD). Four attention-based blocks process the prompted embeddings at different levels in parallel and the final prediction is obtained by integrating all four outputs.
  • ...and 2 more figures