Table of Contents
Fetching ...

Boosting SAM for Cross-Domain Few-Shot Segmentation via Conditional Point Sparsification

Jiahao Nie, Yun Xing, Wenbin An, Qingsong Zhao, Jiawei Shao, Yap-Peng Tan, Alex C. Kot, Shijian Lu, Xuelong Li

TL;DR

The paper addresses cross-domain few-shot segmentation challenges for SAM by showing that dense prompt points degrade performance under domain shift. It introduces Conditional Point Sparsification (CPS), a training-free approach that uses reference exemplars to adapt prompt density through boundary-aware dense matching, adaptive region-wise sparsification, and a reference-density lookup, followed by post-hoc mask refinement. CPS consistently outperforms existing training-free SAM-based methods across four CD-FSS datasets and demonstrates robust generalization to in-domain natural image FSS tasks. The proposed method offers a practical, data-free strategy to extend SAM’s segmentation capabilities to diverse domains with minimal computation.

Abstract

Motivated by the success of the Segment Anything Model (SAM) in promptable segmentation, recent studies leverage SAM to develop training-free solutions for few-shot segmentation, which aims to predict object masks in the target image based on a few reference exemplars. These SAM-based methods typically rely on point matching between reference and target images and use the matched dense points as prompts for mask prediction. However, we observe that dense points perform poorly in Cross-Domain Few-Shot Segmentation (CD-FSS), where target images are from medical or satellite domains. We attribute this issue to large domain shifts that disrupt the point-image interactions learned by SAM, and find that point density plays a crucial role under such conditions. To address this challenge, we propose Conditional Point Sparsification (CPS), a training-free approach that adaptively guides SAM interactions for cross-domain images based on reference exemplars. Leveraging ground-truth masks, the reference images provide reliable guidance for adaptively sparsifying dense matched points, enabling more accurate segmentation results. Extensive experiments demonstrate that CPS outperforms existing training-free SAM-based methods across diverse CD-FSS datasets.

Boosting SAM for Cross-Domain Few-Shot Segmentation via Conditional Point Sparsification

TL;DR

The paper addresses cross-domain few-shot segmentation challenges for SAM by showing that dense prompt points degrade performance under domain shift. It introduces Conditional Point Sparsification (CPS), a training-free approach that uses reference exemplars to adapt prompt density through boundary-aware dense matching, adaptive region-wise sparsification, and a reference-density lookup, followed by post-hoc mask refinement. CPS consistently outperforms existing training-free SAM-based methods across four CD-FSS datasets and demonstrates robust generalization to in-domain natural image FSS tasks. The proposed method offers a practical, data-free strategy to extend SAM’s segmentation capabilities to diverse domains with minimal computation.

Abstract

Motivated by the success of the Segment Anything Model (SAM) in promptable segmentation, recent studies leverage SAM to develop training-free solutions for few-shot segmentation, which aims to predict object masks in the target image based on a few reference exemplars. These SAM-based methods typically rely on point matching between reference and target images and use the matched dense points as prompts for mask prediction. However, we observe that dense points perform poorly in Cross-Domain Few-Shot Segmentation (CD-FSS), where target images are from medical or satellite domains. We attribute this issue to large domain shifts that disrupt the point-image interactions learned by SAM, and find that point density plays a crucial role under such conditions. To address this challenge, we propose Conditional Point Sparsification (CPS), a training-free approach that adaptively guides SAM interactions for cross-domain images based on reference exemplars. Leveraging ground-truth masks, the reference images provide reliable guidance for adaptively sparsifying dense matched points, enabling more accurate segmentation results. Extensive experiments demonstrate that CPS outperforms existing training-free SAM-based methods across diverse CD-FSS datasets.
Paper Structure (19 sections, 20 equations, 6 figures, 5 tables)

This paper contains 19 sections, 20 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a) Overview of SAM for promptable segmentation. (b–d) Existing methods exhibit inconsistent segmentation behavior between in-domain and cross-domain images.
  • Figure 2: Analysis of SAM-encoded target-object patch features. (a) Category level: t-SNE visualization shows that in-domain images exhibit larger intra-category variance than cross-domain images. (b) Dataset level: Mean variance of all categories (bars) and inter-category variance range (error bars) confirm this trend.
  • Figure 3: Cross-domain and in-domain datasets exhibit varying sensitivities to prompt point density.
  • Figure 4: Overview of the proposed Conditional Point Sparsification (CPS). CPS not only leverages the reference image to match candidate prompt points in the target image (Sec. \ref{['ssec:dpm']}), but also exploits the reference image to determine an appropriate point density for subsequent sparsification (Sec. \ref{['ssec:cd']}). The proposed modules, including boundary point pruning (Sec. \ref{['ssec:dpm']}), adaptive point sparsification (Sec. \ref{['ssec:aps']}), and post-hoc mask refinement (Sec. \ref{['ssec:pmr']}), jointly contribute to producing accurate segmentation masks of the target image.
  • Figure 5: Qualitative segmentation results (red mask) of CPS on samples from four Cross-Domain Few-Shot Segmentation datasets. More examples are provided in the appendix.
  • ...and 1 more figures