Table of Contents
Fetching ...

Bridging Granularity Gaps: Hierarchical Semantic Learning for Cross-domain Few-shot Segmentation

Sujun Sun, Haowen Gu, Cheng Xie, Yanxu Ren, Mingwu Ren, Haofeng Zhang

TL;DR

CD-FSS suffers from segmentation granularity gaps between source training and target data, limiting semantic discriminability. The authors introduce Hierarchical Semantic Learning (HSL), comprising Dual Style Randomization (DSR), Hierarchical Semantic Mining (HSM), and Prototype Confidence-modulated Thresholding (PCMT), to learn multi-granularity semantic features without target-domain fine-tuning. DSR augments training with foreground and global style variations, HSM builds region-aware prototypes from multi-scale superpixels and fuses low- and high-level cues, and PCMT adaptively thresholds predictions based on prototype confidence to reduce segmentation ambiguity. Across four target-domain datasets, HSL achieves state-of-the-art results, with ablations confirming meaningful gains from each component and the use of multi-scale region priors.

Abstract

Cross-domain Few-shot Segmentation (CD-FSS) aims to segment novel classes from target domains that are not involved in training and have significantly different data distributions from the source domain, using only a few annotated samples, and recent years have witnessed significant progress on this task. However, existing CD-FSS methods primarily focus on style gaps between source and target domains while ignoring segmentation granularity gaps, resulting in insufficient semantic discriminability for novel classes in target domains. Therefore, we propose a Hierarchical Semantic Learning (HSL) framework to tackle this problem. Specifically, we introduce a Dual Style Randomization (DSR) module and a Hierarchical Semantic Mining (HSM) module to learn hierarchical semantic features, thereby enhancing the model's ability to recognize semantics at varying granularities. DSR simulates target domain data with diverse foreground-background style differences and overall style variations through foreground and global style randomization respectively, while HSM leverages multi-scale superpixels to guide the model to mine intra-class consistency and inter-class distinction at different granularities. Additionally, we also propose a Prototype Confidence-modulated Thresholding (PCMT) module to mitigate segmentation ambiguity when foreground and background are excessively similar. Extensive experiments are conducted on four popular target domain datasets, and the results demonstrate that our method achieves state-of-the-art performance.

Bridging Granularity Gaps: Hierarchical Semantic Learning for Cross-domain Few-shot Segmentation

TL;DR

CD-FSS suffers from segmentation granularity gaps between source training and target data, limiting semantic discriminability. The authors introduce Hierarchical Semantic Learning (HSL), comprising Dual Style Randomization (DSR), Hierarchical Semantic Mining (HSM), and Prototype Confidence-modulated Thresholding (PCMT), to learn multi-granularity semantic features without target-domain fine-tuning. DSR augments training with foreground and global style variations, HSM builds region-aware prototypes from multi-scale superpixels and fuses low- and high-level cues, and PCMT adaptively thresholds predictions based on prototype confidence to reduce segmentation ambiguity. Across four target-domain datasets, HSL achieves state-of-the-art results, with ablations confirming meaningful gains from each component and the use of multi-scale region priors.

Abstract

Cross-domain Few-shot Segmentation (CD-FSS) aims to segment novel classes from target domains that are not involved in training and have significantly different data distributions from the source domain, using only a few annotated samples, and recent years have witnessed significant progress on this task. However, existing CD-FSS methods primarily focus on style gaps between source and target domains while ignoring segmentation granularity gaps, resulting in insufficient semantic discriminability for novel classes in target domains. Therefore, we propose a Hierarchical Semantic Learning (HSL) framework to tackle this problem. Specifically, we introduce a Dual Style Randomization (DSR) module and a Hierarchical Semantic Mining (HSM) module to learn hierarchical semantic features, thereby enhancing the model's ability to recognize semantics at varying granularities. DSR simulates target domain data with diverse foreground-background style differences and overall style variations through foreground and global style randomization respectively, while HSM leverages multi-scale superpixels to guide the model to mine intra-class consistency and inter-class distinction at different granularities. Additionally, we also propose a Prototype Confidence-modulated Thresholding (PCMT) module to mitigate segmentation ambiguity when foreground and background are excessively similar. Extensive experiments are conducted on four popular target domain datasets, and the results demonstrate that our method achieves state-of-the-art performance.

Paper Structure

This paper contains 22 sections, 15 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Motivation of the proposed method. (a) Segmentation granularity gaps between the source and target domains, e.g., the foreground-background differences in the target domain are more similar to the finer-grained differences within the foreground of the source domain. The model trained on the source domain only focuses on single-granularity differences between training classes and background, with insufficient ability to distinguish the foreground and background in the target domain. (b) We aim to extract hierarchical semantic features to adapt to target domain data with different segmentation granularities.
  • Figure 2: Overview of our method. We first use the SSM to extract multi-scale superpixel masks for both support and query images. These masks are applied to the DSR and HSM to assist in extracting hierarchical semantic features. Subsequently, support and query images are sequentially fed into the DSR, image encoder, and HSM for data augmentation, feature extraction, and feature enhancement, resulting in hierarchical semantic features. Finally, we compute foreground and background prototypes through SSP. These prototypes and query features are fed into the PCMT to perform query image segmentation.
  • Figure 3: The heatmaps of foreground similarity maps for the source domain and target domains demonstrate that our method can extract hierarchical semantic features.
  • Figure 4: The heatmaps of foreground confidence maps and segmentation results using different thresholding methods.