Table of Contents
Fetching ...

Boosting Few-Shot Semantic Segmentation Via Segment Anything Model

Chen-Bin Feng, Qi Lai, Kangdao Liu, Houcheng Su, Chi-Man Vong

TL;DR

This work tackles contour inaccuracies in few-shot semantic segmentation by introducing a training-free post-processing pipeline that leverages the Segment Anything Model (SAM). By generating prompts from an initial FSS mask and refining the prediction with SAM, then using a Prediction Results Selection (PRS) step to exclude likely false SAM outputs, the method boosts segmentation quality without additional training. The approach is validated on Pascal-5^i and COCO-20^i, showing consistent improvements over strong baselines in both quantitative metrics (e.g., $mIoU$ and $FB ext{-}mIoU$) and qualitative edge/detail fidelity. The training-agnostic nature and plug-and-play design make it readily applicable to existing FSS systems, potentially expanding SAM’s utility in downstream tasks requiring precise mask contours.

Abstract

In semantic segmentation, accurate prediction masks are crucial for downstream tasks such as medical image analysis and image editing. Due to the lack of annotated data, few-shot semantic segmentation (FSS) performs poorly in predicting masks with precise contours. Recently, we have noticed that the large foundation model segment anything model (SAM) performs well in processing detailed features. Inspired by SAM, we propose FSS-SAM to boost FSS methods by addressing the issue of inaccurate contour. The FSS-SAM is training-free. It works as a post-processing tool for any FSS methods and can improve the accuracy of predicted masks. Specifically, we use predicted masks from FSS methods to generate prompts and then use SAM to predict new masks. To avoid predicting wrong masks with SAM, we propose a prediction result selection (PRS) algorithm. The algorithm can remarkably decrease wrong predictions. Experiment results on public datasets show that our method is superior to base FSS methods in both quantitative and qualitative aspects.

Boosting Few-Shot Semantic Segmentation Via Segment Anything Model

TL;DR

This work tackles contour inaccuracies in few-shot semantic segmentation by introducing a training-free post-processing pipeline that leverages the Segment Anything Model (SAM). By generating prompts from an initial FSS mask and refining the prediction with SAM, then using a Prediction Results Selection (PRS) step to exclude likely false SAM outputs, the method boosts segmentation quality without additional training. The approach is validated on Pascal-5^i and COCO-20^i, showing consistent improvements over strong baselines in both quantitative metrics (e.g., and ) and qualitative edge/detail fidelity. The training-agnostic nature and plug-and-play design make it readily applicable to existing FSS systems, potentially expanding SAM’s utility in downstream tasks requiring precise mask contours.

Abstract

In semantic segmentation, accurate prediction masks are crucial for downstream tasks such as medical image analysis and image editing. Due to the lack of annotated data, few-shot semantic segmentation (FSS) performs poorly in predicting masks with precise contours. Recently, we have noticed that the large foundation model segment anything model (SAM) performs well in processing detailed features. Inspired by SAM, we propose FSS-SAM to boost FSS methods by addressing the issue of inaccurate contour. The FSS-SAM is training-free. It works as a post-processing tool for any FSS methods and can improve the accuracy of predicted masks. Specifically, we use predicted masks from FSS methods to generate prompts and then use SAM to predict new masks. To avoid predicting wrong masks with SAM, we propose a prediction result selection (PRS) algorithm. The algorithm can remarkably decrease wrong predictions. Experiment results on public datasets show that our method is superior to base FSS methods in both quantitative and qualitative aspects.
Paper Structure (24 sections, 2 theorems, 3 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 3 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

When the Condition 1 meets, the $M^q_{SAM}$ is a right prediction.

Figures (9)

  • Figure 1: The qualitative comparison of the segmentation mask (mark in white) and corresponding inpainting (IP) results by the base FSS method and our boosted FSS-SAM.
  • Figure 2: The quantitative comparisons of base FSS, FSS-SAM without PRS algorithm, and FSS-SAM with PRS algorithm in terms of mIoU and FB-mIoU on PASCAL-$5^i$.
  • Figure 3: Framework of our FSS-SAM. The FSS model and SAM are all frozen, the parameters are all fixed. In the PRS algorithm, IoU means intersection over union, and T denotes the threshold. The YES and NO indicate whether the condition is met.
  • Figure 4: The illustration of different prompts.
  • Figure 5: The right and wrong prediction by FSS-SAM. For each sub-figures, The top left is $I^q$, top right is $M^q_{FSS}$, bottom left is $GT^q$, bottom right is $M^q_{SAM}$.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2