Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation
Qun Li, Baoquan Sun, Fu Xiao, Yonggang Qi, Bir Bhanu
TL;DR
Sym-Net tackles Few-Shot Segmentation by enforcing symmetry in learning query and support prototypes, thereby mitigating intra-class variation. It introduces three integrated modules: Self-activation based Prior Mask (SPM) to locate query regions without learned parameters, Visual-text Alignment Prototype Aggregation (APA) to fuse visual cues with semantic embeddings for robust prototypes, and Top-Down Hyper-Correlation (TDC) to recover multi-scale spatial relations lost during prototype compression. A co-optimized hard triplet loss further aligns the query and support prototypes, yielding hybrid prototypes that generalize to unseen classes. Empirical results on PASCAL-$5^i$ and COCO-$20^i$ demonstrate state-of-the-art performance in both $1$- and $5$-shot settings, with ablations confirming the contribution of each module to improved robustness and segmentation accuracy.
Abstract
We propose Sym-Net, a novel framework for Few-Shot Segmentation (FSS) that addresses the critical issue of intra-class variation by jointly learning both query and support prototypes in a symmetrical manner. Unlike previous methods that generate query prototypes solely by matching query features to support prototypes, which is a form of bias learning towards the few-shot support samples, Sym-Net leverages a balanced symmetrical learning approach for both query and support prototypes, ensuring that the learning process does not favor one set (support or query) over the other. One of main modules of Sym-Net is the visual-text alignment-based prototype aggregation module, which is not just query-guided prototype refinement, it is a jointly learning from both support and query samples, which makes the model beneficial for handling intra-class discrepancies and allows it to generalize better to new, unseen classes. Specifically, a parameter-free prior mask generation module is designed to accurately localize both local and global regions of the query object by using sliding windows of different sizes and a self-activation kernel to suppress incorrect background matches. Additionally, to address the information loss caused by spatial pooling during prototype learning, a top-down hyper-correlation module is integrated to capture multi-scale spatial relationships between support and query images. This approach is further jointly optimized by implementing a co-optimized hard triplet mining strategy. Experimental results show that the proposed Sym-Net outperforms state-of-the-art models, which demonstrates that jointly learning support-query prototypes in a symmetrical manner for FSS offers a promising direction to enhance segmentation performance with limited annotated data.
