PUMA: Efficient Continual Graph Learning for Node Classification with Graph Condensation
Yilun Liu, Ruihong Qiu, Yanran Tang, Hongzhi Yin, Zi Huang
TL;DR
PUMA tackles catastrophic forgetting in continual graph learning by extending CaT with a pseudo-label guided memory bank that includes unlabelled nodes, a retraining strategy to balance knowledge between old and new graphs, and efficiency enhancements via one-time propagation, wide graph encoders, and edge-free memories trained with an MLP. It uses distribution matching and MMD-based objectives to condense streaming graphs into informative, compact replay data, enabling effective learning from limited memory budgets. Extensive experiments across six datasets show state-of-the-art or competitive performance with significantly improved efficiency, while ablations confirm the value of pseudo-labeling and retraining. The approach offers a practical, scalable solution for continual node classification on evolving graphs, with clear gains in memory efficiency and learning stability in real-world streaming scenarios.
Abstract
When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning (CGL) emerges as a novel paradigm enabling graph representation learning from streaming graphs. Our prior work, Condense and Train (CaT) is a replay-based CGL framework with a balanced continual learning procedure, which designs a small yet effective memory bankn for replaying. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a PsUdo-label guided Memory bAnk (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on six datasets for the node classification task demonstrate the state-of-the-art performance and efficiency over existing methods.
