Table of Contents
Fetching ...

E-CGL: An Efficient Continual Graph Learner

Jianhao Guo, Zixuan Ni, Yun Zhu, Siliang Tang

TL;DR

The paper tackles continual graph learning by addressing two core challenges: interdependencies among evolving graphs and scalability on large graphs. It introduces E-CGL, which combines Graph Dependent Replay with a graph-aware sampling strategy (importance and diversity) and an Efficient Graph Learner that trains an MLP with shared weights to a GCN, reintroducing graph structure only during inference. Empirical results on four large node-classification datasets under both Task-IL and Class-IL show state-of-the-art performance with reduced forgetting (average AF around $-1.1\%$) and substantial training/inference speedups (average $15.83\times$ training, $4.89\times$ inference). The work delivers a practical, scalable approach to continual graph learning with public code, advancing performance and efficiency on evolving graphs.

Abstract

Continual learning has emerged as a crucial paradigm for learning from sequential data while preserving previous knowledge. In the realm of continual graph learning, where graphs continuously evolve based on streaming graph data, continual graph learning presents unique challenges that require adaptive and efficient graph learning methods in addition to the problem of catastrophic forgetting. The first challenge arises from the interdependencies between different graph data, where previous graphs can influence new data distributions. The second challenge lies in the efficiency concern when dealing with large graphs. To addresses these two problems, we produce an Efficient Continual Graph Learner (E-CGL) in this paper. We tackle the interdependencies issue by demonstrating the effectiveness of replay strategies and introducing a combined sampling strategy that considers both node importance and diversity. To overcome the limitation of efficiency, E-CGL leverages a simple yet effective MLP model that shares weights with a GCN during training, achieving acceleration by circumventing the computationally expensive message passing process. Our method comprehensively surpasses nine baselines on four graph continual learning datasets under two settings, meanwhile E-CGL largely reduces the catastrophic forgetting problem down to an average of -1.1%. Additionally, E-CGL achieves an average of 15.83x training time acceleration and 4.89x inference time acceleration across the four datasets. These results indicate that E-CGL not only effectively manages the correlation between different graph data during continual training but also enhances the efficiency of continual learning on large graphs. The code is publicly available at https://github.com/aubreygjh/E-CGL.

E-CGL: An Efficient Continual Graph Learner

TL;DR

The paper tackles continual graph learning by addressing two core challenges: interdependencies among evolving graphs and scalability on large graphs. It introduces E-CGL, which combines Graph Dependent Replay with a graph-aware sampling strategy (importance and diversity) and an Efficient Graph Learner that trains an MLP with shared weights to a GCN, reintroducing graph structure only during inference. Empirical results on four large node-classification datasets under both Task-IL and Class-IL show state-of-the-art performance with reduced forgetting (average AF around ) and substantial training/inference speedups (average training, inference). The work delivers a practical, scalable approach to continual graph learning with public code, advancing performance and efficiency on evolving graphs.

Abstract

Continual learning has emerged as a crucial paradigm for learning from sequential data while preserving previous knowledge. In the realm of continual graph learning, where graphs continuously evolve based on streaming graph data, continual graph learning presents unique challenges that require adaptive and efficient graph learning methods in addition to the problem of catastrophic forgetting. The first challenge arises from the interdependencies between different graph data, where previous graphs can influence new data distributions. The second challenge lies in the efficiency concern when dealing with large graphs. To addresses these two problems, we produce an Efficient Continual Graph Learner (E-CGL) in this paper. We tackle the interdependencies issue by demonstrating the effectiveness of replay strategies and introducing a combined sampling strategy that considers both node importance and diversity. To overcome the limitation of efficiency, E-CGL leverages a simple yet effective MLP model that shares weights with a GCN during training, achieving acceleration by circumventing the computationally expensive message passing process. Our method comprehensively surpasses nine baselines on four graph continual learning datasets under two settings, meanwhile E-CGL largely reduces the catastrophic forgetting problem down to an average of -1.1%. Additionally, E-CGL achieves an average of 15.83x training time acceleration and 4.89x inference time acceleration across the four datasets. These results indicate that E-CGL not only effectively manages the correlation between different graph data during continual training but also enhances the efficiency of continual learning on large graphs. The code is publicly available at https://github.com/aubreygjh/E-CGL.
Paper Structure (36 sections, 19 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 19 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Left: Visualization of continual graph learning. Right: Illustration of conditional probabilities on continual graph learning. The grey lines show Bayes rule for independent identically distributed data. The red line represents the influence of previous data on the current graph.
  • Figure 2: Parameters sensitivity analysis on E-CGL, the shallow shades are variances. Left: diversity sampling ratio. Middle: sampling budget for Graph Dependent Replay. Right: loss weight $\lambda$.
  • Figure 3: Visualization: Learning curves of AA over task sequences. Note: The curve for joint training on OGBN-Products is unavailable due to resource limitations.
  • Figure 4: Visualization: Performance matrices on CoraFull, OGBN-Arxiv, Reddit, and OGBN-Products.