Table of Contents
Fetching ...

PUMA: Efficient Continual Graph Learning for Node Classification with Graph Condensation

Yilun Liu, Ruihong Qiu, Yanran Tang, Hongzhi Yin, Zi Huang

TL;DR

PUMA tackles catastrophic forgetting in continual graph learning by extending CaT with a pseudo-label guided memory bank that includes unlabelled nodes, a retraining strategy to balance knowledge between old and new graphs, and efficiency enhancements via one-time propagation, wide graph encoders, and edge-free memories trained with an MLP. It uses distribution matching and MMD-based objectives to condense streaming graphs into informative, compact replay data, enabling effective learning from limited memory budgets. Extensive experiments across six datasets show state-of-the-art or competitive performance with significantly improved efficiency, while ablations confirm the value of pseudo-labeling and retraining. The approach offers a practical, scalable solution for continual node classification on evolving graphs, with clear gains in memory efficiency and learning stability in real-world streaming scenarios.

Abstract

When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning (CGL) emerges as a novel paradigm enabling graph representation learning from streaming graphs. Our prior work, Condense and Train (CaT) is a replay-based CGL framework with a balanced continual learning procedure, which designs a small yet effective memory bankn for replaying. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a PsUdo-label guided Memory bAnk (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on six datasets for the node classification task demonstrate the state-of-the-art performance and efficiency over existing methods.

PUMA: Efficient Continual Graph Learning for Node Classification with Graph Condensation

TL;DR

PUMA tackles catastrophic forgetting in continual graph learning by extending CaT with a pseudo-label guided memory bank that includes unlabelled nodes, a retraining strategy to balance knowledge between old and new graphs, and efficiency enhancements via one-time propagation, wide graph encoders, and edge-free memories trained with an MLP. It uses distribution matching and MMD-based objectives to condense streaming graphs into informative, compact replay data, enabling effective learning from limited memory budgets. Extensive experiments across six datasets show state-of-the-art or competitive performance with significantly improved efficiency, while ablations confirm the value of pseudo-labeling and retraining. The approach offers a practical, scalable solution for continual node classification on evolving graphs, with clear gains in memory efficiency and learning stability in real-world streaming scenarios.

Abstract

When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning (CGL) emerges as a novel paradigm enabling graph representation learning from streaming graphs. Our prior work, Condense and Train (CaT) is a replay-based CGL framework with a balanced continual learning procedure, which designs a small yet effective memory bankn for replaying. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a PsUdo-label guided Memory bAnk (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on six datasets for the node classification task demonstrate the state-of-the-art performance and efficiency over existing methods.
Paper Structure (38 sections, 16 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 38 sections, 16 equations, 8 figures, 9 tables, 2 algorithms.

Figures (8)

  • Figure 1: Loss and accuracy comparisons for the first two tasks when training the replayed graph of Task 1 while replaying the replayed graph of Task 0 at the first 50 steps after the replayed graph of Task 0 has been learned 500 steps. The continual (dotted lines) and retrained (solid lines) loss values for Task 0 (red) and Task 1 (blue) are shown, providing a visual representation of an optimisation problem.
  • Figure 2: Details of edge-free graph condensation with pseudo-labelling and retraining. PUMA condenses the incoming graph $\mathcal{G}_{k}$ to $\mathcal{\tilde{G}}_{k}$ first, an extra classifier is trained by the memory bank $\mathcal{M}_{k}$ to assign pseudo labels to $\mathcal{G}_{k}$. After that, $\mathcal{G}_{k}$ is condensed again to update the $\mathcal{M}_{k}$. The MLP model is initialised first and trained by the $\mathcal{G}_{k}$.
  • Figure 3: AP of different memory banks with different budget ratios. All memory banks use both TiM and retraining to obtain the best performance for fairness.
  • Figure 4: Performance matrix visualisation of PUMA with and without TiM scheme in all four datasets. The coloured square located at the $i_{th}$ row and the $j_{th}$ column denotes the classification accuracy of Task $\mathcal{T}_j$ after model training on Task $\mathcal{T}_i$. Light colour means high accuracy, and dark colour means low accuracy. The $i_{th}$ column from top to bottom can represent the accuracy changes during the model's continual training of Task $\mathcal{T}_i$.
  • Figure 5: The accuracy changes of the first task when training with the last task on four datasets.
  • ...and 3 more figures