Table of Contents
Fetching ...

Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer

Xinyue Chen, Miaojing Shi, Zijian Zhou, Lianghua He, Sophia Tsoka

TL;DR

GFSS aims to segment both base and novel classes under data imbalance, but the base-to-novel distribution gap hampers performance. The paper introduces GFSS-EKT, incorporating Novel Prototype Modulation (NPM), Novel Classifier Calibration (NCC), and Context Consistency Learning (CCL) within a two-phase training framework to transfer knowledge from base to novel classes effectively. NPM uses cross-attention to modulate novel prototypes with base prototypes, NCC aligns novel classifier weights to the base distribution via mean/variance adjustments, and CCL exploits context from base images through a context-aware augmentation and consistency loss. Experiments on Pascal-5i and COCO-20i show state-of-the-art gains, especially in 1-shot settings, with improvements in $mIoU$ for novel classes and balanced performance across base and novel classes.

Abstract

Generalized few-shot semantic segmentation (GFSS) aims to segment objects of both base and novel classes, using sufficient samples of base classes and few samples of novel classes. Representative GFSS approaches typically employ a two-phase training scheme, involving base class pre-training followed by novel class fine-tuning, to learn the classifiers for base and novel classes respectively. Nevertheless, distribution gap exists between base and novel classes in this process. To narrow this gap, we exploit effective knowledge transfer from base to novel classes. First, a novel prototype modulation module is designed to modulate novel class prototypes by exploiting the correlations between base and novel classes. Second, a novel classifier calibration module is proposed to calibrate the weight distribution of the novel classifier according to that of the base classifier. Furthermore, existing GFSS approaches suffer from a lack of contextual information for novel classes due to their limited samples, we thereby introduce a context consistency learning scheme to transfer the contextual knowledge from base to novel classes. Extensive experiments on PASCAL-5$^i$ and COCO-20$^i$ demonstrate that our approach significantly enhances the state of the art in the GFSS setting. The code is available at: https://github.com/HHHHedy/GFSS-EKT.

Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer

TL;DR

GFSS aims to segment both base and novel classes under data imbalance, but the base-to-novel distribution gap hampers performance. The paper introduces GFSS-EKT, incorporating Novel Prototype Modulation (NPM), Novel Classifier Calibration (NCC), and Context Consistency Learning (CCL) within a two-phase training framework to transfer knowledge from base to novel classes effectively. NPM uses cross-attention to modulate novel prototypes with base prototypes, NCC aligns novel classifier weights to the base distribution via mean/variance adjustments, and CCL exploits context from base images through a context-aware augmentation and consistency loss. Experiments on Pascal-5i and COCO-20i show state-of-the-art gains, especially in 1-shot settings, with improvements in for novel classes and balanced performance across base and novel classes.

Abstract

Generalized few-shot semantic segmentation (GFSS) aims to segment objects of both base and novel classes, using sufficient samples of base classes and few samples of novel classes. Representative GFSS approaches typically employ a two-phase training scheme, involving base class pre-training followed by novel class fine-tuning, to learn the classifiers for base and novel classes respectively. Nevertheless, distribution gap exists between base and novel classes in this process. To narrow this gap, we exploit effective knowledge transfer from base to novel classes. First, a novel prototype modulation module is designed to modulate novel class prototypes by exploiting the correlations between base and novel classes. Second, a novel classifier calibration module is proposed to calibrate the weight distribution of the novel classifier according to that of the base classifier. Furthermore, existing GFSS approaches suffer from a lack of contextual information for novel classes due to their limited samples, we thereby introduce a context consistency learning scheme to transfer the contextual knowledge from base to novel classes. Extensive experiments on PASCAL-5 and COCO-20 demonstrate that our approach significantly enhances the state of the art in the GFSS setting. The code is available at: https://github.com/HHHHedy/GFSS-EKT.

Paper Structure

This paper contains 30 sections, 5 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Illustration of (a) FSS and (b) GFSS: FSS models predict only the novel class specified by the support image, whereas GFSS models can predict both base and novel classes at the same time. During inference, GFSS models do not rely on support images of novel classes any more, as they are fine-tuned using all samples of novel classes to form a novel classifier. In this context, "horse" is a base class, while "person" represents a novel class.
  • Figure 2: Overview of our model. Novel prototype modulation (NPM) module, novel classifier calibration (NCC) module, and context consistency loss (CCL) module are shown in the boxes with green, yellow, and orange backgrounds respectively.
  • Figure 3: (a) illustrates the weight distributions of two classifiers without NCC; (b) shows their weight distributions after implementing NCC.
  • Figure 4: Qualitative results of our method and POP liu2023learning on PASCAL-5$^i$.
  • Figure 5: t-SNE visualization of features and class prototypes. (a) illustrates the t-SNE visualization of POP liu2023learning; (b) shows the t-SNE visualization of our proposed method. The base prototypes ($\mathcal{U}_b$) are represented as blue pentagrams, and the novel prototypes ($\mathcal{U}_n$) are shown as red rectangles. The features of base classes ($\mathcal{F}_b$) and novel classes ($\mathcal{F}_n$) are visualized as dots, where features of the same class are clustered and represented by dots of the same color.