Table of Contents
Fetching ...

Contrastive Augmented Graph2Graph Memory Interaction for Few Shot Continual Learning

Biqing Qi, Junqi Gao, Xingquan Chen, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

TL;DR

To address sample scarcity in classes from new sessions, the Contrast-Augmented G2G (CAG2G) is introduced to promote the aggregation of same class features thus helps few-shot learning and extensive comparisons on CIFAR100, CUB200, and the challenging ImageNet-R dataset demonstrate the superiority of the method over existing methods.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) has gained considerable attention in recent years for its pivotal role in addressing continuously arriving classes. However, it encounters additional challenges. The scarcity of samples in new sessions intensifies overfitting, causing incompatibility between the output features of new and old classes, thereby escalating catastrophic forgetting. A prevalent strategy involves mitigating catastrophic forgetting through the Explicit Memory (EM), which comprise of class prototypes. However, current EM-based methods retrieves memory globally by performing Vector-to-Vector (V2V) interaction between features corresponding to the input and prototypes stored in EM, neglecting the geometric structure of local features. This hinders the accurate modeling of their positional relationships. To incorporate information of local geometric structure, we extend the V2V interaction to Graph-to-Graph (G2G) interaction. For enhancing local structures for better G2G alignment and the prevention of local feature collapse, we propose the Local Graph Preservation (LGP) mechanism. Additionally, to address sample scarcity in classes from new sessions, the Contrast-Augmented G2G (CAG2G) is introduced to promote the aggregation of same class features thus helps few-shot learning. Extensive comparisons on CIFAR100, CUB200, and the challenging ImageNet-R dataset demonstrate the superiority of our method over existing methods.

Contrastive Augmented Graph2Graph Memory Interaction for Few Shot Continual Learning

TL;DR

To address sample scarcity in classes from new sessions, the Contrast-Augmented G2G (CAG2G) is introduced to promote the aggregation of same class features thus helps few-shot learning and extensive comparisons on CIFAR100, CUB200, and the challenging ImageNet-R dataset demonstrate the superiority of the method over existing methods.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) has gained considerable attention in recent years for its pivotal role in addressing continuously arriving classes. However, it encounters additional challenges. The scarcity of samples in new sessions intensifies overfitting, causing incompatibility between the output features of new and old classes, thereby escalating catastrophic forgetting. A prevalent strategy involves mitigating catastrophic forgetting through the Explicit Memory (EM), which comprise of class prototypes. However, current EM-based methods retrieves memory globally by performing Vector-to-Vector (V2V) interaction between features corresponding to the input and prototypes stored in EM, neglecting the geometric structure of local features. This hinders the accurate modeling of their positional relationships. To incorporate information of local geometric structure, we extend the V2V interaction to Graph-to-Graph (G2G) interaction. For enhancing local structures for better G2G alignment and the prevention of local feature collapse, we propose the Local Graph Preservation (LGP) mechanism. Additionally, to address sample scarcity in classes from new sessions, the Contrast-Augmented G2G (CAG2G) is introduced to promote the aggregation of same class features thus helps few-shot learning. Extensive comparisons on CIFAR100, CUB200, and the challenging ImageNet-R dataset demonstrate the superiority of our method over existing methods.
Paper Structure (31 sections, 25 equations, 5 figures, 3 tables)

This paper contains 31 sections, 25 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Diagram illustrating our motivation. In metric-dependent vector-to-vector alignment, there can be multiple local alignment relationships between the features used for interaction and prototypes in explicit memory, resulting in the same Euclidean distance/cosine similarity between vectors. This leads to imprecise modeling of the positional relationship between features and class prototypes of global vector-to-vector interaction. However, G2G alignment introduces stronger structural constraints, enabling more accurate modeling of the positional relationship between features and class prototypes.
  • Figure 2: Overview of our method: For each input image $\boldsymbol{x}$, we feed it, along with its enhanced view $\tilde{\boldsymbol{x}}$, into the pre-trained ViT. The resulting features $f_{\boldsymbol{\varphi}}(\boldsymbol{x})$ and $f_{\boldsymbol{\varphi}}(\tilde{\boldsymbol{x}})$ are then input into the interactor $g_{\boldsymbol{\theta}}$ for G2G interaction. During the training phase, we conduct G2G alignment between the features of both the original and augmented views output from $g_{\boldsymbol{\theta}}$. This encourages improved intra-class concentration, simultaneously aligning the output features of the two views with the ground-truth label (denoted as $y^*$) through G2G.
  • Figure 3: The results of ablation experiments on ImageNet-R are shown. (a) demonstrates the performance of our method on the evaluation set $\mathcal{E}^{(t)}$ of all previous sessions under different choices of $S$. The three bars in each group of (b) display, from left to right, the average accuracy of our method on each incremental sessions for the corresponding choice of $\lambda$, the average accuracy on the evaluation set $\mathcal{E}^{(t)}$ after each round of incremental session, and the final evaluation set accuracy after the last session. We draw a violin plot of the test precision for each round of few-shot incremental sessions for each $\eta$ choice in (c), where the dashed line indicates the final test precision on $\mathcal{E}^{(T)}$ after completing all incremental training.
  • Figure 4: The results of ablation experiments on CIFAR100 (three figures above) and CUB200 (three figures below) datasets. (a) and (d) demonstrates the performance of our method on the evaluation set $\mathcal{E}^{(t)}$ of all previous sessions under different choices of $S$. The three bars in each group of (b) and (e) display, from left to right, the average accuracy of our method on each incremental sessions for the corresponding choice of $\lambda$, the average accuracy on the evaluation set $\mathcal{E}^{(t)}$ after each round of incremental session, and the final evaluation set accuracy after the last session. We draw a violin plot of the test precision for each round of few-shot incremental sessions for each $\eta$ choice in (c) and (f), where the dashed line indicates the final test precision on $\mathcal{E}^{(T)}$ after completing all incremental training.
  • Figure 5: The histogram illustrates the distances between the node features learned by different GNN models and their class centers. The right figure displays the distances between the anchor points learned by different GNN models and the actual samples, where the samples marked in green are the actual anchor samples and the samples marked in purple do not belong to this category.