On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

Weilin Cong; Jian Kang; Hanghang Tong; Mehrdad Mahdavi

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

Weilin Cong, Jian Kang, Hanghang Tong, Mehrdad Mahdavi

TL;DR

This paper establishes the connection between the generalization error of TGL algorithms and"the number of layers/steps" in the GNN-/RNN-based TGL methods and the feature-label alignment (FLA) score, where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods.

Abstract

Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time. Although TGL has recently seen notable progress in algorithmic solutions, its theoretical foundations remain largely unexplored. This paper aims at bridging this gap by investigating the generalization ability of different TGL algorithms (e.g., GNN-based, RNN-based, and memory-based methods) under the finite-wide over-parameterized regime. We establish the connection between the generalization error of TGL algorithms and "the number of layers/steps" in the GNN-/RNN-based TGL methods and "the feature-label alignment (FLA) score", where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods. Guided by our theoretical analysis, we propose Simplified-Temporal-Graph-Network, which enjoys a small generalization error, improved overall performance, and lower model complexity. Extensive experiments on real-world datasets demonstrate the effectiveness of our method. Our theoretical findings and proposed algorithm offer essential insights into TGL from a theoretical standpoint, laying the groundwork for the designing practical TGL algorithms in future studies.

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

TL;DR

Abstract

Paper Structure (64 sections, 30 theorems, 171 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 64 sections, 30 theorems, 171 equations, 7 figures, 9 tables, 1 algorithm.

Introduction
Contributions.
Related works and preliminaries
Generalization of temporal graph learning methods
Problem setting for theoretical analysis
Assumptions and main theoretical results
Discussion on the generalization bound: insights and limitations
Dependency on depth and steps.
Dependency on feature-label alignment (FLA).
A simplified algorithm
Simplified temporal graph network: input data and neural architecture
Encoding features via GNN.
Comparison to existing methods
Comparison to TGAT.
Comparison to TGN.
...and 49 more sections

Key Result

Theorem 1

Given any $\delta\in(0,1/e]$, FLA-related constant $R=\mathcal{O}(\sqrt{\mathbf{y}^\top (\mathbf{J} \mathbf{J}^\top )^{-1} \mathbf{y}})$, and number of training iterations $N$ (one training example per iteration), there exists $m^\star=\mathcal{O}(N^2/L^2)\log(1/\delta)$ such that, if hidden dimensi where $\widetilde{\bm{\theta}}$ is uniformly sampled from $\{ \bm{\theta}_0,\ldots,\bm{\theta}_{N-1

Figures (7)

Figure 1: Relationship between the generalization error (in Theorem \ref{['theorem:all_generalization']}) and empirical average precision score. Generalization error (GE) and average precision (AP) have an inverse correlation, i.e., the larger the GE, the lower the AP. Each marker is one experiment run. The same method's GE changes at each run because it depends on feature-label alignment, which changes with different weight initialization. More details on the computation of GE are deferred to Appendix \ref{['section:how to compute FLA']}.
Figure 2: An illustration of temporal graph data with nodes $v_1, \ldots,v_5$ and timestamps $t_1, \ldots, t_6$ that indicate when two nodes interact.
Figure 3: Comparison of FLA (y-axis) of different methods (x-axis) on real-world datasets.
Figure 4: Comparison of the average prevision of validation set and generalization gap of different methods on real-world datasets.
Figure 5: Comparison of feature-label alignment (y-axis) and average precision score (in red text) of different model input selection (x-axis) on real-world datasets.
...and 2 more figures

Theorems & Definitions (55)

Definition 1
Theorem 1
Definition 2: $\omega$-neighborhood cao2019generalization
Definition 3: Neural tangent random feature cao2019generalization
Lemma 1
proof : Proof of Lemma \ref{['lemma:output_change_small']}
Lemma 2
proof : Proof of Lemma \ref{['lemma:the neural network output is almost linear in W']}
Lemma 3
proof : Proof of Lemma \ref{['lemma:almost convex']}
...and 45 more

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

TL;DR

Abstract

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (55)