Table of Contents
Fetching ...

Prompt-Driven Continual Graph Learning

Qi Wang, Tianfei Zhou, Ye Yuan, Rui Mao

TL;DR

PromptCGL tackles continual learning on evolving graphs where replay-based memory is costly and privacy-sensitive. It introduces a frozen GNN backbone $g_{\Theta}$, a trainable predictor $f_{\Phi}$, and task-specific prompts $\mathbf{P}$, enhanced by hierarchical prompting and a Personalized Prompt Generator (PG). The method achieves state-of-the-art average performance with a memory footprint of $O(k(d_f+d_h))$ and constant memory with respect to graph size, using only $k \approx 3$ prompts, and demonstrates robustness across backbones and four benchmarks. It offers faster training than retraining-based approaches while preserving data privacy by storing prompts instead of historical data, marking a significant advance in graph lifelong learning.

Abstract

Continual Graph Learning (CGL), which aims to accommodate new tasks over evolving graph data without forgetting prior knowledge, is garnering significant research interest. Mainstream solutions adopt the memory replay-based idea, ie, caching representative data from earlier tasks for retraining the graph model. However, this strategy struggles with scalability issues for constantly evolving graphs and raises concerns regarding data privacy. Inspired by recent advancements in the prompt-based learning paradigm, this paper introduces a novel prompt-driven continual graph learning (PROMPTCGL) framework, which learns a separate prompt for each incoming task and maintains the underlying graph neural network model fixed. In this way, PROMPTCGL naturally avoids catastrophic forgetting of knowledge from previous tasks. More specifically, we propose hierarchical prompting to instruct the model from both feature- and topology-level to fully address the variability of task graphs in dynamic continual learning. Additionally, we develop a personalized prompt generator to generate tailored prompts for each graph node while minimizing the number of prompts needed, leading to constant memory consumption regardless of the graph scale. Extensive experiments on four benchmarks show that PROMPTCGL achieves superior performance against existing CGL approaches while significantly reducing memory consumption. Our code is available at https://github.com/QiWang98/PromptCGL.

Prompt-Driven Continual Graph Learning

TL;DR

PromptCGL tackles continual learning on evolving graphs where replay-based memory is costly and privacy-sensitive. It introduces a frozen GNN backbone , a trainable predictor , and task-specific prompts , enhanced by hierarchical prompting and a Personalized Prompt Generator (PG). The method achieves state-of-the-art average performance with a memory footprint of and constant memory with respect to graph size, using only prompts, and demonstrates robustness across backbones and four benchmarks. It offers faster training than retraining-based approaches while preserving data privacy by storing prompts instead of historical data, marking a significant advance in graph lifelong learning.

Abstract

Continual Graph Learning (CGL), which aims to accommodate new tasks over evolving graph data without forgetting prior knowledge, is garnering significant research interest. Mainstream solutions adopt the memory replay-based idea, ie, caching representative data from earlier tasks for retraining the graph model. However, this strategy struggles with scalability issues for constantly evolving graphs and raises concerns regarding data privacy. Inspired by recent advancements in the prompt-based learning paradigm, this paper introduces a novel prompt-driven continual graph learning (PROMPTCGL) framework, which learns a separate prompt for each incoming task and maintains the underlying graph neural network model fixed. In this way, PROMPTCGL naturally avoids catastrophic forgetting of knowledge from previous tasks. More specifically, we propose hierarchical prompting to instruct the model from both feature- and topology-level to fully address the variability of task graphs in dynamic continual learning. Additionally, we develop a personalized prompt generator to generate tailored prompts for each graph node while minimizing the number of prompts needed, leading to constant memory consumption regardless of the graph scale. Extensive experiments on four benchmarks show that PROMPTCGL achieves superior performance against existing CGL approaches while significantly reducing memory consumption. Our code is available at https://github.com/QiWang98/PromptCGL.

Paper Structure

This paper contains 32 sections, 9 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Main Idea. Replay-based methods, e.g., CaT liu2023cat, SSM zhang2022sparsified, ER-GNN zhou2021overcoming, require a memory buffer to store a number of graph nodes per task, which is merged with the incoming graph for model retraining (see (a)). However, they face a severe degradation when the buffer size decreases (see (b) and (c)). In contrast, PromptCGL represents a novel prompt-based learning paradigm, which learns a fixed number of prompts for each unique task, and leaves GNNs parameters unchanged during the continual learning process. From (b) and (c), PromptCGL shows leading performance, regardless of the size of memory buffer.
  • Figure 2: Illustration of PromptCGL framework. Here we present the execution steps for task $\mathcal{T}_t$. All tasks except $\mathcal{T}_0$ follow the same procedure. The backbone parameters, pre-trained on task $\mathcal{T}_0$, remain frozen in subsequent tasks. Initially, node-level personalized prompts are generated by the personalized prompt generator (PG) based on the query result of the node feature and a maintained small node-level prompt set, which are then added to the node features. These are processed through 1-th layer GNN to obtain node representations with topological information. Subsequently, subgraph-level personalized prompts are generated and added using the same method and passed into the subsequent networks. Learned prompts are saved into prompt bank after each task and selected based on task identity during inference for prediction.
  • Figure 3: Illustration of the Personalized Prompt Generator.
  • Figure 4: Performance matrix visualization of Joint, Ours, CaT, SSM, MAS and GEM on CoraFull, Arxiv, Reddit and Products datasets (from top to bottom). Each entry in these matrices represents the performance of task $j$ (column) after learning task $i$ (row). Light colours indicate high accuracy and dark colours indicate low accuracy. Column $i$ from top to bottom can represent the change in the model's accuracy on all learned tasks after the model has learned task $\mathcal{T}_i$.
  • Figure 5: The visualization of node embedding learned without (left) and with prompts (right) on four datasets.
  • ...and 2 more figures