Table of Contents
Fetching ...

A Selective Learning Method for Temporal Graph Continual Learning

Hanmo Liu, Shimin Di, Haoyang Li, Xun Jian, Yue Wang, Lei Chen

TL;DR

This work defines temporal graph continual learning (TGCL) to address open-class dynamics where new classes emerge and old-class distributions evolve. It proposes Learning Towards the Future (LTF), a selective-learning framework that substitutes old-class data with informative subsets and aligns their embedding distribution to minimize classification error, backed by a theoretical upper bound on error. The method converts the bound into a tractable subset-selection objective, leveraging a greedy, monotone-submodular optimization with MMD-based distribution regularization, and learns with a stop-gradient distillation-like mechanism. Empirical results on Yelp, Reddit, and Amazon demonstrate that LTF outperforms both regularization- and replay-based baselines across TGAT and DyGFormer backbones, offering improved accuracy and reduced forgetting with practical training efficiency.

Abstract

Node classification is a key task in temporal graph learning (TGL). Real-life temporal graphs often introduce new node classes over time, but existing TGL methods assume a fixed set of classes. This assumption brings limitations, as updating models with full data is costly, while focusing only on new classes results in forgetting old ones. Graph continual learning (GCL) methods mitigate forgetting using old-class subsets but fail to account for their evolution. We define this novel problem as temporal graph continual learning (TGCL), which focuses on efficiently maintaining up-to-date knowledge of old classes. To tackle TGCL, we propose a selective learning framework that substitutes the old-class data with its subsets, Learning Towards the Future (LTF). We derive an upper bound on the error caused by such replacement and transform it into objectives for selecting and learning subsets that minimize classification error while preserving the distribution of the full old-class data. Experiments on three real-world datasets validate the effectiveness of LTF on TGCL.

A Selective Learning Method for Temporal Graph Continual Learning

TL;DR

This work defines temporal graph continual learning (TGCL) to address open-class dynamics where new classes emerge and old-class distributions evolve. It proposes Learning Towards the Future (LTF), a selective-learning framework that substitutes old-class data with informative subsets and aligns their embedding distribution to minimize classification error, backed by a theoretical upper bound on error. The method converts the bound into a tractable subset-selection objective, leveraging a greedy, monotone-submodular optimization with MMD-based distribution regularization, and learns with a stop-gradient distillation-like mechanism. Empirical results on Yelp, Reddit, and Amazon demonstrate that LTF outperforms both regularization- and replay-based baselines across TGAT and DyGFormer backbones, offering improved accuracy and reduced forgetting with practical training efficiency.

Abstract

Node classification is a key task in temporal graph learning (TGL). Real-life temporal graphs often introduce new node classes over time, but existing TGL methods assume a fixed set of classes. This assumption brings limitations, as updating models with full data is costly, while focusing only on new classes results in forgetting old ones. Graph continual learning (GCL) methods mitigate forgetting using old-class subsets but fail to account for their evolution. We define this novel problem as temporal graph continual learning (TGCL), which focuses on efficiently maintaining up-to-date knowledge of old classes. To tackle TGCL, we propose a selective learning framework that substitutes the old-class data with its subsets, Learning Towards the Future (LTF). We derive an upper bound on the error caused by such replacement and transform it into objectives for selecting and learning subsets that minimize classification error while preserving the distribution of the full old-class data. Experiments on three real-world datasets validate the effectiveness of LTF on TGCL.

Paper Structure

This paper contains 39 sections, 3 theorems, 19 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\mathcal{G}_N^{old}, \mathcal{G}_N^{sub}$ be the distributions of $G_N^{old}$ and $G_N^{sub}$. Let $h \in \mathcal{H}$ be a function in the hypothesis space $\mathcal{H}$ and $\tilde{h}^{sub}_N$ be the function optimized on $\mathcal{G}_N^{sub}$. The classification error on $\mathcal{G}_N^{old} where $d_{\mathcal{H}\Delta \mathcal{H}}(\mathcal{G}_a, \mathcal{G}_b)=2\sup _{h\in \mathcal{H}\Del

Figures (9)

  • Figure 1: The differences in temporal graph learning (TGL), graph continual learning (GCL) and temporal graph continual learning (TGCL). At a new period, TGL assumes no data of new classes appear, while GCL assumes static old-class data. TGCL holds neither of these assumptions, thus is more suitable to real-life temporal graphs.
  • Figure 2: The selective learning framework of LTF on old-class data. From $G_N^{old}$, $G_N^{sub}$ is greedily selected by having the lowest classification error $j_{cls}(\cdot)$ and distribution discrepancy $j_{MMD}(\cdot)$, while $G_N^{sim}$ is greedily selected only by the lowest $j_{MMD}(\cdot)$. Afterwards, $G_N^{sub}$ is learned by minimizing the classification error and aligning the distribution with $G_N^{sim}$.
  • Figure 3: The average precision (AP) of LTF and the baselines at each period based on TGAT.
  • Figure 4: The average forgetting (AF) of LTF and the baselines at each period based on TGAT.
  • Figure 5: Sensitivity on the key hyper-parameters based on TGAT.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Lemma 3.1
  • proof
  • Theorem 3.1
  • proof
  • Definition \section{}: Radial Basis Function Kernel related-rbf