SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
Shi-Feng Peng, Guolei Sun, Yong Li, Hongsong Wang, Guo-Sen Xie
TL;DR
The paper tackles cross-domain few-shot segmentation (CD-FSS) by leveraging the cross-domain generalization of the SAM model. It introduces SAM-Aware Graph Prompt Reasoning Network (GPRN), which converts SAM masks into semantic-aware visual prompts via SPI, uses a graph attention mechanism (GPR) to aggregate information across prompts, and applies a test-time Adaptive Point Selection (APS) to refine predictions, with SSP guiding prototype-based query prediction. Training uses a binary cross-entropy loss, while APS operates at test time to dynamically query SAM and refine results. Experiments on four CD-FSS datasets show state-of-the-art performance, with ablations confirming the additive benefits of SPI, GPR, and APS in achieving strong cross-domain generalization and accurate segmentation in limited-data regimes.
Abstract
The primary challenge of cross-domain few-shot segmentation (CD-FSS) is the domain disparity between the training and inference phases, which can exist in either the input data or the target classes. Previous models struggle to learn feature representations that generalize to various unknown domains from limited training domain samples. In contrast, the large-scale visual model SAM, pre-trained on tens of millions of images from various domains and classes, possesses excellent generalizability. In this work, we propose a SAM-aware graph prompt reasoning network (GPRN) that fully leverages SAM to guide CD-FSS feature representation learning and improve prediction accuracy. Specifically, we propose a SAM-aware prompt initialization module (SPI) to transform the masks generated by SAM into visual prompts enriched with high-level semantic information. Since SAM tends to divide an object into many sub-regions, this may lead to visual prompts representing the same semantic object having inconsistent or fragmented features. We further propose a graph prompt reasoning (GPR) module that constructs a graph among visual prompts to reason about their interrelationships and enable each visual prompt to aggregate information from similar prompts, thus achieving global semantic consistency. Subsequently, each visual prompt embeds its semantic information into the corresponding mask region to assist in feature representation learning. To refine the segmentation mask during testing, we also design a non-parameter adaptive point selection module (APS) to select representative point prompts from query predictions and feed them back to SAM to refine inaccurate segmentation results. Experiments on four standard CD-FSS datasets demonstrate that our method establishes new state-of-the-art results. Code: https://github.com/CVL-hub/GPRN.
