Topology-aware Embedding Memory for Continual Learning on Expanding Networks

Xikun Zhang; Dongjin Song; Yixin Chen; Dacheng Tao

Topology-aware Embedding Memory for Continual Learning on Expanding Networks

Xikun Zhang, Dongjin Song, Yixin Chen, Dacheng Tao

TL;DR

This work tackles memory explosion in replay-based continual learning on expanding graphs by introducing PDGNNs with Topology-aware Embedding Memory (TEM). It decouples trainable parameters from topology, encoding computation ego-subnetworks into fixed-size topology embeddings (TEs) and storing them in TEM, reducing buffer complexity from $O(n d^L)$ to $O(n)$ while preserving crucial topological information for replay. A pseudo-training effect is established, indicating that replaying a TE influences neighboring nodes, which motivates a coverage-maximization sampling strategy to maximize subnetwork coverage under tight memory budgets. Empirical results on four large graph datasets demonstrate that PDGNNs-TEM consistently outperform state-of-the-art baselines in class-IL and task-IL scenarios, with favorable memory-footprint trade-offs and interpretable embeddings.

Abstract

Memory replay based techniques have shown great success for continual learning with incrementally accumulated Euclidean data. Directly applying them to continually expanding networks, however, leads to the potential memory explosion problem due to the need to buffer representative nodes and their associated topological neighborhood structures. To this end, we systematically analyze the key challenges in the memory explosion problem, and present a general framework, \textit{i.e.}, Parameter Decoupled Graph Neural Networks (PDGNNs) with Topology-aware Embedding Memory (TEM), to tackle this issue. The proposed framework not only reduces the memory space complexity from $\mathcal{O}(nd^L)$ to $\mathcal{O}(n)$~\footnote{$n$: memory budget, $d$: average node degree, $L$: the radius of the GNN receptive field}, but also fully utilizes the topological information for memory replay. Specifically, PDGNNs decouple trainable parameters from the computation ego-subnetwork via \textit{Topology-aware Embeddings} (TEs), which compress ego-subnetworks into compact vectors (\textit{i.e.}, TEs) to reduce the memory consumption. Based on this framework, we discover a unique \textit{pseudo-training effect} in continual learning on expanding networks and this effect motivates us to develop a novel \textit{coverage maximization sampling} strategy that can enhance the performance with a tight memory budget. Thorough empirical studies demonstrate that, by tackling the memory explosion problem and incorporating topological information into memory replay, PDGNNs with TEM significantly outperform state-of-the-art techniques, especially in the challenging class-incremental setting.

Topology-aware Embedding Memory for Continual Learning on Expanding Networks

TL;DR

while preserving crucial topological information for replay. A pseudo-training effect is established, indicating that replaying a TE influences neighboring nodes, which motivates a coverage-maximization sampling strategy to maximize subnetwork coverage under tight memory budgets. Empirical results on four large graph datasets demonstrate that PDGNNs-TEM consistently outperform state-of-the-art baselines in class-IL and task-IL scenarios, with favorable memory-footprint trade-offs and interpretable embeddings.

Abstract

~\footnote{

: memory budget,

: average node degree,

: the radius of the GNN receptive field}, but also fully utilizes the topological information for memory replay. Specifically, PDGNNs decouple trainable parameters from the computation ego-subnetwork via \textit{Topology-aware Embeddings} (TEs), which compress ego-subnetworks into compact vectors (\textit{i.e.}, TEs) to reduce the memory consumption. Based on this framework, we discover a unique \textit{pseudo-training effect} in continual learning on expanding networks and this effect motivates us to develop a novel \textit{coverage maximization sampling} strategy that can enhance the performance with a tight memory budget. Thorough empirical studies demonstrate that, by tackling the memory explosion problem and incorporating topological information into memory replay, PDGNNs with TEM significantly outperform state-of-the-art techniques, especially in the challenging class-incremental setting.

Paper Structure (22 sections, 1 theorem, 11 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 11 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Continual Learning & Continual Learning on Expanding Networks
GNNs $\&$ Reservoir Computing
Parameter Decoupled GNNs with Topology-aware Embedding Memory
Preliminaries
Memory Replay Meets GNNs
Parameter Decoupled GNNs with TEM
Instantiations of PDGNNs
Pseudo-training Effects of TEs
Pseudo-training Effect and Network Homophily
Coverage Maximization Sampling
Experiments
Datasets
Experimental Setup and Model Evaluation
...and 7 more sections

Key Result

Theorem 1

Given a node $v$, its computation ego-subnetwork $\mathcal{G}^{sub}_v$, the TE $\mathbf{e}_v$, and label $\mathbf{y}_v$ (suppose $v$ belongs to class $k$, i.e.$\mathbf{y}_{v,k}=1$), then training PDGNNs with $\mathbf{e}_v$ has the following two properties: $\mathrm{1}$. It is equivalent to training

Figures (5)

Figure 1: Learning dynamics in an expanding network. We depict new types of nodes with different colors. The new task consisting of new types of nodes may exhibit a different distribution from existing ones. Consequently, as the model adapts to these new types of nodes, it may undergo a significant performance degradation on existing tasks, a phenomenon known as catastrophic forgetting.
Figure 2: (a) ER-GNN zhou2021overcoming that stores the input attributes of individual nodes. (b) Sparsified Subgraph Memory (SSM) zhang2022sparsified that stores sparsified computation ego-subnetworks. (c) Our PDGNNs with TEM. The incoming computation ego-subnetworks are embedded as TEs and then fed into the trainable function. The stored TEs are sampled based on their coverage ratio (Section \ref{['sec: cover max sample']}).
Figure 3: Illustration of the coverage ratio. Supposing the network has $N$ nodes, $R_c(\{u\})=\frac{13}{N}$, $R_c(\{v\})=\frac{15}{N}$, $R_c(\{u\})=\frac{14}{N}$, and $R_c(\{u,v,w\})=\frac{42}{N}$
Figure 4: Dynamics of average accuracy in the class-IL scenario.(a) CoraFull, 2 classes per task, 35 tasks. (b) OGB-Arxiv, 2 classes per task, 20 tasks. (c) Reddit, 2 classes per task, 20 tasks. (d) OGB-Products, 2 classes per task, 23 tasks.
Figure 6: Visualization of the node embeddings of different classes of Reddit, after learning 1, 10, and 20 tasks. From the top to the bottom, we show the results of Fine-tune, ER-GNN, and PDGNNs-TEM. Each color corresponds to a class.

Theorems & Definitions (3)

Definition 1: Topology-aware embedding
Theorem 1: Pseudo-training
Definition 2

Topology-aware Embedding Memory for Continual Learning on Expanding Networks

TL;DR

Abstract

Topology-aware Embedding Memory for Continual Learning on Expanding Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)