Table of Contents
Fetching ...

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

Weilin Cong, Jian Kang, Hanghang Tong, Mehrdad Mahdavi

TL;DR

This paper establishes the connection between the generalization error of TGL algorithms and"the number of layers/steps" in the GNN-/RNN-based TGL methods and the feature-label alignment (FLA) score, where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods.

Abstract

Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time. Although TGL has recently seen notable progress in algorithmic solutions, its theoretical foundations remain largely unexplored. This paper aims at bridging this gap by investigating the generalization ability of different TGL algorithms (e.g., GNN-based, RNN-based, and memory-based methods) under the finite-wide over-parameterized regime. We establish the connection between the generalization error of TGL algorithms and "the number of layers/steps" in the GNN-/RNN-based TGL methods and "the feature-label alignment (FLA) score", where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods. Guided by our theoretical analysis, we propose Simplified-Temporal-Graph-Network, which enjoys a small generalization error, improved overall performance, and lower model complexity. Extensive experiments on real-world datasets demonstrate the effectiveness of our method. Our theoretical findings and proposed algorithm offer essential insights into TGL from a theoretical standpoint, laying the groundwork for the designing practical TGL algorithms in future studies.

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

TL;DR

This paper establishes the connection between the generalization error of TGL algorithms and"the number of layers/steps" in the GNN-/RNN-based TGL methods and the feature-label alignment (FLA) score, where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods.

Abstract

Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time. Although TGL has recently seen notable progress in algorithmic solutions, its theoretical foundations remain largely unexplored. This paper aims at bridging this gap by investigating the generalization ability of different TGL algorithms (e.g., GNN-based, RNN-based, and memory-based methods) under the finite-wide over-parameterized regime. We establish the connection between the generalization error of TGL algorithms and "the number of layers/steps" in the GNN-/RNN-based TGL methods and "the feature-label alignment (FLA) score", where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods. Guided by our theoretical analysis, we propose Simplified-Temporal-Graph-Network, which enjoys a small generalization error, improved overall performance, and lower model complexity. Extensive experiments on real-world datasets demonstrate the effectiveness of our method. Our theoretical findings and proposed algorithm offer essential insights into TGL from a theoretical standpoint, laying the groundwork for the designing practical TGL algorithms in future studies.
Paper Structure (64 sections, 30 theorems, 171 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 64 sections, 30 theorems, 171 equations, 7 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Given any $\delta\in(0,1/e]$, FLA-related constant $R=\mathcal{O}(\sqrt{\mathbf{y}^\top (\mathbf{J} \mathbf{J}^\top )^{-1} \mathbf{y}})$, and number of training iterations $N$ (one training example per iteration), there exists $m^\star=\mathcal{O}(N^2/L^2)\log(1/\delta)$ such that, if hidden dimensi where $\widetilde{\bm{\theta}}$ is uniformly sampled from $\{ \bm{\theta}_0,\ldots,\bm{\theta}_{N-1

Figures (7)

  • Figure 1: Relationship between the generalization error (in Theorem \ref{['theorem:all_generalization']}) and empirical average precision score. Generalization error (GE) and average precision (AP) have an inverse correlation, i.e., the larger the GE, the lower the AP. Each marker is one experiment run. The same method's GE changes at each run because it depends on feature-label alignment, which changes with different weight initialization. More details on the computation of GE are deferred to Appendix \ref{['section:how to compute FLA']}.
  • Figure 2: An illustration of temporal graph data with nodes $v_1, \ldots,v_5$ and timestamps $t_1, \ldots, t_6$ that indicate when two nodes interact.
  • Figure 3: Comparison of FLA (y-axis) of different methods (x-axis) on real-world datasets.
  • Figure 4: Comparison of the average prevision of validation set and generalization gap of different methods on real-world datasets.
  • Figure 5: Comparison of feature-label alignment (y-axis) and average precision score (in red text) of different model input selection (x-axis) on real-world datasets.
  • ...and 2 more figures

Theorems & Definitions (55)

  • Definition 1
  • Theorem 1
  • Definition 2: $\omega$-neighborhood cao2019generalization
  • Definition 3: Neural tangent random feature cao2019generalization
  • Lemma 1
  • proof : Proof of Lemma \ref{['lemma:output_change_small']}
  • Lemma 2
  • proof : Proof of Lemma \ref{['lemma:the neural network output is almost linear in W']}
  • Lemma 3
  • proof : Proof of Lemma \ref{['lemma:almost convex']}
  • ...and 45 more