Table of Contents
Fetching ...

Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning

Xusheng Cao, Haori Lu, Linlan Huang, Fei Yang, Xialei Liu, Ming-Ming Cheng

TL;DR

This work tackles catastrophic forgetting in class-incremental learning by introducing KG-GMM, which incrementally builds a common-sense knowledge graph from relations in ConceptNet for each new class. The method combines a frozen image encoder and text encoder with an LLM-guided token generation workflow, supervised by ground-truth triplets derived from the evolving graph, to maintain detailed knowledge of older classes. At inference, a graph-augmented text output is formed by extracting relation triplets and grounding the prediction in the current subgraph, reducing misclassification among similar classes. Across conventional and few-shot benchmarks (Tiny-ImageNet, ImageNet-R, CIFAR100, Mini-ImageNet), KG-GMM achieves state-of-the-art results with minimal training overhead, demonstrating that structured relational knowledge can robustly preserve prior knowledge in continual learning.

Abstract

Continual learning in computer vision faces the critical challenge of catastrophic forgetting, where models struggle to retain prior knowledge while adapting to new tasks. Although recent studies have attempted to leverage the generalization capabilities of pre-trained models to mitigate overfitting on current tasks, models still tend to forget details of previously learned categories as tasks progress, leading to misclassification. To address these limitations, we introduce a novel Knowledge Graph Enhanced Generative Multi-modal model (KG-GMM) that builds an evolving knowledge graph throughout the learning process. Our approach utilizes relationships within the knowledge graph to augment the class labels and assigns different relations to similar categories to enhance model differentiation. During testing, we propose a Knowledge Graph Augmented Inference method that locates specific categories by analyzing relationships within the generated text, thereby reducing the loss of detailed information about old classes when learning new knowledge and alleviating forgetting. Experiments demonstrate that our method effectively leverages relational information to help the model correct mispredictions, achieving state-of-the-art results in both conventional CIL and few-shot CIL settings, confirming the efficacy of knowledge graphs at preserving knowledge in the continual learning scenarios.

Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning

TL;DR

This work tackles catastrophic forgetting in class-incremental learning by introducing KG-GMM, which incrementally builds a common-sense knowledge graph from relations in ConceptNet for each new class. The method combines a frozen image encoder and text encoder with an LLM-guided token generation workflow, supervised by ground-truth triplets derived from the evolving graph, to maintain detailed knowledge of older classes. At inference, a graph-augmented text output is formed by extracting relation triplets and grounding the prediction in the current subgraph, reducing misclassification among similar classes. Across conventional and few-shot benchmarks (Tiny-ImageNet, ImageNet-R, CIFAR100, Mini-ImageNet), KG-GMM achieves state-of-the-art results with minimal training overhead, demonstrating that structured relational knowledge can robustly preserve prior knowledge in continual learning.

Abstract

Continual learning in computer vision faces the critical challenge of catastrophic forgetting, where models struggle to retain prior knowledge while adapting to new tasks. Although recent studies have attempted to leverage the generalization capabilities of pre-trained models to mitigate overfitting on current tasks, models still tend to forget details of previously learned categories as tasks progress, leading to misclassification. To address these limitations, we introduce a novel Knowledge Graph Enhanced Generative Multi-modal model (KG-GMM) that builds an evolving knowledge graph throughout the learning process. Our approach utilizes relationships within the knowledge graph to augment the class labels and assigns different relations to similar categories to enhance model differentiation. During testing, we propose a Knowledge Graph Augmented Inference method that locates specific categories by analyzing relationships within the generated text, thereby reducing the loss of detailed information about old classes when learning new knowledge and alleviating forgetting. Experiments demonstrate that our method effectively leverages relational information to help the model correct mispredictions, achieving state-of-the-art results in both conventional CIL and few-shot CIL settings, confirming the efficacy of knowledge graphs at preserving knowledge in the continual learning scenarios.

Paper Structure

This paper contains 16 sections, 10 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: The difference of inference pipeline between the generative-based baseline GMM cao2024GMM and our Knowledge Graph enhanced generative multi-modal model (KG-GMM).
  • Figure 2: The knowledge graph construction process. Rectangle nodes represent class nodes, while round ones represent non-class nodes. Blue rectangles represent classes encountered in task $t-1$, while the orange ones represent classes from task $t$. Blue Arrows represent relations used for learning task $t-1$, while the orange ones represent relations used for task $t$.
  • Figure 3: Left: In knowledge graph-enhanced learning, image embedding, knowledge-enhanced ground-truth embedding, and question embedding are input to Frozen LLM to generate predicted embedding; cross-entropy loss updates the linear layer. Right: In knowledge graph-augmented inference, relations from predicted text are extracted to create graph-augmented text, producing the final prediction.
  • Figure 4: Text examples of our methods against the original GMM.
  • Figure 5: Model Accuracy vs. Inference Time comparison regarding max tokens configured during inference.