Table of Contents
Fetching ...

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun

TL;DR

APSeg addresses cross-domain few-shot semantic segmentation by integrating a frozen SAM backbone with two key components: DPAT, which fuses support and pseudo query prototypes to compute a robust domain-agnostic feature transformation, and MPG, which autonomously generates sparse and dense prompt embeddings for SAM. By employing cycle-consistent selection to augment prototypes and a closed-form transformation using a shared matrix $W$ that satisfies $W P^{ m m} = A$, APSeg enables direct deployment on target domains without fine-tuning. The approach achieves state-of-the-art results on four CD-FSS benchmarks, with notable gains in 1-shot and 5-shot settings and strong performance on domain-heterogeneous datasets like Chest X-ray and ISIC. Overall, APSeg demonstrates that automatic prompting, when coupled with principled prototype-based transformation, can unlock SAM’s potential for cross-domain segmentation in practical, data-scarce scenarios.

Abstract

Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

TL;DR

APSeg addresses cross-domain few-shot semantic segmentation by integrating a frozen SAM backbone with two key components: DPAT, which fuses support and pseudo query prototypes to compute a robust domain-agnostic feature transformation, and MPG, which autonomously generates sparse and dense prompt embeddings for SAM. By employing cycle-consistent selection to augment prototypes and a closed-form transformation using a shared matrix that satisfies , APSeg enables direct deployment on target domains without fine-tuning. The approach achieves state-of-the-art results on four CD-FSS benchmarks, with notable gains in 1-shot and 5-shot settings and strong performance on domain-heterogeneous datasets like Chest X-ray and ISIC. Overall, APSeg demonstrates that automatic prompting, when coupled with principled prototype-based transformation, can unlock SAM’s potential for cross-domain segmentation in practical, data-scarce scenarios.

Abstract

Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
Paper Structure (23 sections, 9 equations, 9 figures, 6 tables)

This paper contains 23 sections, 9 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (a) In CD-FSS tasks, training (source) and testing (target) datasets come from different domains, and categories in the testing dataset are unseen during the training phase. (b) The framework of PerSAM zhang2023personalize, an existing one-shot segmentation method based on SAM. (C) The framework of our proposed APSeg.
  • Figure 1: Visual comparison of segmentation results with and without cycle-consistent selection (CCS) in dual prototype anchor transformation (DPAT) module under the 1-shot setting.
  • Figure 2: The overall architecture of our proposed APSeg in a 1-way 1-shot example. After obtaining the multi-layer features of support and query images, DPAT is employed to transform the domain-specific features into domain-agnostic ones through linear transformation. Then, the transformed features are passed into MPG to generate prompt embeddings. At last, the mask decoder takes the prompt embeddings and the transformed high-level query feature as input to make a prediction for the query image. In the testing phase, the trained model is directly applied to complete meta-testing in the target domain.
  • Figure 2: Visual Comparison Results between APSeg and PerSAM in four target datasets under the 1-shot setting.
  • Figure 3: A visual example of a support-query pair to perform cycle-consistent selection (CCS).
  • ...and 4 more figures