Table of Contents
Fetching ...

Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation

Qun Li, Baoquan Sun, Fu Xiao, Yonggang Qi, Bir Bhanu

TL;DR

Sym-Net tackles Few-Shot Segmentation by enforcing symmetry in learning query and support prototypes, thereby mitigating intra-class variation. It introduces three integrated modules: Self-activation based Prior Mask (SPM) to locate query regions without learned parameters, Visual-text Alignment Prototype Aggregation (APA) to fuse visual cues with semantic embeddings for robust prototypes, and Top-Down Hyper-Correlation (TDC) to recover multi-scale spatial relations lost during prototype compression. A co-optimized hard triplet loss further aligns the query and support prototypes, yielding hybrid prototypes that generalize to unseen classes. Empirical results on PASCAL-$5^i$ and COCO-$20^i$ demonstrate state-of-the-art performance in both $1$- and $5$-shot settings, with ablations confirming the contribution of each module to improved robustness and segmentation accuracy.

Abstract

We propose Sym-Net, a novel framework for Few-Shot Segmentation (FSS) that addresses the critical issue of intra-class variation by jointly learning both query and support prototypes in a symmetrical manner. Unlike previous methods that generate query prototypes solely by matching query features to support prototypes, which is a form of bias learning towards the few-shot support samples, Sym-Net leverages a balanced symmetrical learning approach for both query and support prototypes, ensuring that the learning process does not favor one set (support or query) over the other. One of main modules of Sym-Net is the visual-text alignment-based prototype aggregation module, which is not just query-guided prototype refinement, it is a jointly learning from both support and query samples, which makes the model beneficial for handling intra-class discrepancies and allows it to generalize better to new, unseen classes. Specifically, a parameter-free prior mask generation module is designed to accurately localize both local and global regions of the query object by using sliding windows of different sizes and a self-activation kernel to suppress incorrect background matches. Additionally, to address the information loss caused by spatial pooling during prototype learning, a top-down hyper-correlation module is integrated to capture multi-scale spatial relationships between support and query images. This approach is further jointly optimized by implementing a co-optimized hard triplet mining strategy. Experimental results show that the proposed Sym-Net outperforms state-of-the-art models, which demonstrates that jointly learning support-query prototypes in a symmetrical manner for FSS offers a promising direction to enhance segmentation performance with limited annotated data.

Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation

TL;DR

Sym-Net tackles Few-Shot Segmentation by enforcing symmetry in learning query and support prototypes, thereby mitigating intra-class variation. It introduces three integrated modules: Self-activation based Prior Mask (SPM) to locate query regions without learned parameters, Visual-text Alignment Prototype Aggregation (APA) to fuse visual cues with semantic embeddings for robust prototypes, and Top-Down Hyper-Correlation (TDC) to recover multi-scale spatial relations lost during prototype compression. A co-optimized hard triplet loss further aligns the query and support prototypes, yielding hybrid prototypes that generalize to unseen classes. Empirical results on PASCAL- and COCO- demonstrate state-of-the-art performance in both - and -shot settings, with ablations confirming the contribution of each module to improved robustness and segmentation accuracy.

Abstract

We propose Sym-Net, a novel framework for Few-Shot Segmentation (FSS) that addresses the critical issue of intra-class variation by jointly learning both query and support prototypes in a symmetrical manner. Unlike previous methods that generate query prototypes solely by matching query features to support prototypes, which is a form of bias learning towards the few-shot support samples, Sym-Net leverages a balanced symmetrical learning approach for both query and support prototypes, ensuring that the learning process does not favor one set (support or query) over the other. One of main modules of Sym-Net is the visual-text alignment-based prototype aggregation module, which is not just query-guided prototype refinement, it is a jointly learning from both support and query samples, which makes the model beneficial for handling intra-class discrepancies and allows it to generalize better to new, unseen classes. Specifically, a parameter-free prior mask generation module is designed to accurately localize both local and global regions of the query object by using sliding windows of different sizes and a self-activation kernel to suppress incorrect background matches. Additionally, to address the information loss caused by spatial pooling during prototype learning, a top-down hyper-correlation module is integrated to capture multi-scale spatial relationships between support and query images. This approach is further jointly optimized by implementing a co-optimized hard triplet mining strategy. Experimental results show that the proposed Sym-Net outperforms state-of-the-art models, which demonstrates that jointly learning support-query prototypes in a symmetrical manner for FSS offers a promising direction to enhance segmentation performance with limited annotated data.
Paper Structure (22 sections, 15 equations, 9 figures, 4 tables)

This paper contains 22 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Existing prototype-based methods vs. our proposed Sym-Net. Most of the current work (a) generates the query prototypes by performing similarity-matching the query features to the support prototypes. In contrast, our Sym-Net (b) generates the query prototypes consistent with the support prototype generation scheme in a symmetrical manner associated with the prior query mask. The proposed method presents a query-support hybrid prototype learning framework to match the query-support hybrid prototypes with query features.
  • Figure 2: Schematic overview of Self-activation based Prior Mask generation (SPM). Given a pair of region feature $r_q$ and $r_s$, we obtain a region-based matching score $\mathbf S_r$ after performing region-wise similarity calculations based on a self-activation kernel. By doing so, we can obtain a region-based affinity matrix $\mathbf S_r$. Subsequently, we calculate the average of $\mathbf S_r$ along the second dimension and employ min-max normalization to reshape the feature map to obtain the region-based similarity map $\mathbf{M}_{r}$. $Mean(S_{r};1)$ denotes the average along the second dimension of $\mathbf S_r$.
  • Figure 3: Architecture of visual-text Aggregation based Prototype Alignment (APA) module for the query branch.
  • Figure 4: Visualization of the prior query mask and segmentation of Sym-Net (ours) vs. MIANet. The prior mask provides a preliminary indication of the approximate position of the query object. Benefiting from the prior query mask generated by the self-activation based prior mask generation module (SPM), more accurate segmentation is achieved, even in hard cases. Note that we average four-scale masks from MIANet for a fair comparison. Best viewed in color.
  • Figure 5: Comparison with state of the art methods in mIoU under 1-shot and 5-shot on the PASCAL-$5^i$ dataset. Bold denotes the best result and underline denotes the second best result.
  • ...and 4 more figures