Table of Contents
Fetching ...

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

Lu Yi, Jie Peng, Yanping Zheng, Fengran Mo, Zhewei Wei, Yuhang Ye, Yue Zixuan, Zengfeng Huang

TL;DR

This work argues that existing temporal GNN benchmarks overemphasize edge repetition and neglect complex sequential dynamics essential for real-world future link prediction. It introduces TGB-Seq, an eight-dataset benchmark with low repeat ratios and rich sequential patterns across bipartite and non-bipartite domains, plus public code and leaderboards. Comprehensive experiments show that state-of-the-art temporal GNNs suffer substantial performance drops and high training costs on TGB-Seq, indicating gaps in learning sequential dynamics and generalization to unseen edges. By providing diverse datasets and evaluation tools, TGB-Seq aims to drive the development of more robust and efficient temporal GNN methods with better real-world applicability.

Abstract

Future link prediction is a fundamental challenge in various real-world dynamic systems. To address this, numerous temporal graph neural networks (temporal GNNs) and benchmark datasets have been developed. However, these datasets often feature excessive repeated edges and lack complex sequential dynamics, a key characteristic inherent in many real-world applications such as recommender systems and ``Who-To-Follow'' on social networks. This oversight has led existing methods to inadvertently downplay the importance of learning sequential dynamics, focusing primarily on predicting repeated edges. In this study, we demonstrate that existing methods, such as GraphMixer and DyGFormer, are inherently incapable of learning simple sequential dynamics, such as ``a user who has followed OpenAI and Anthropic is more likely to follow AI at Meta next.'' Motivated by this issue, we introduce the Temporal Graph Benchmark with Sequential Dynamics (TGB-Seq), a new benchmark carefully curated to minimize repeated edges, challenging models to learn sequential dynamics and generalize to unseen edges. TGB-Seq comprises large real-world datasets spanning diverse domains, including e-commerce interactions, movie ratings, business reviews, social networks, citation networks and web link networks. Benchmarking experiments reveal that current methods usually suffer significant performance degradation and incur substantial training costs on TGB-Seq, posing new challenges and opportunities for future research. TGB-Seq datasets, leaderboards, and example codes are available at https://tgb-seq.github.io/.

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

TL;DR

This work argues that existing temporal GNN benchmarks overemphasize edge repetition and neglect complex sequential dynamics essential for real-world future link prediction. It introduces TGB-Seq, an eight-dataset benchmark with low repeat ratios and rich sequential patterns across bipartite and non-bipartite domains, plus public code and leaderboards. Comprehensive experiments show that state-of-the-art temporal GNNs suffer substantial performance drops and high training costs on TGB-Seq, indicating gaps in learning sequential dynamics and generalization to unseen edges. By providing diverse datasets and evaluation tools, TGB-Seq aims to drive the development of more robust and efficient temporal GNN methods with better real-world applicability.

Abstract

Future link prediction is a fundamental challenge in various real-world dynamic systems. To address this, numerous temporal graph neural networks (temporal GNNs) and benchmark datasets have been developed. However, these datasets often feature excessive repeated edges and lack complex sequential dynamics, a key characteristic inherent in many real-world applications such as recommender systems and ``Who-To-Follow'' on social networks. This oversight has led existing methods to inadvertently downplay the importance of learning sequential dynamics, focusing primarily on predicting repeated edges. In this study, we demonstrate that existing methods, such as GraphMixer and DyGFormer, are inherently incapable of learning simple sequential dynamics, such as ``a user who has followed OpenAI and Anthropic is more likely to follow AI at Meta next.'' Motivated by this issue, we introduce the Temporal Graph Benchmark with Sequential Dynamics (TGB-Seq), a new benchmark carefully curated to minimize repeated edges, challenging models to learn sequential dynamics and generalize to unseen edges. TGB-Seq comprises large real-world datasets spanning diverse domains, including e-commerce interactions, movie ratings, business reviews, social networks, citation networks and web link networks. Benchmarking experiments reveal that current methods usually suffer significant performance degradation and incur substantial training costs on TGB-Seq, posing new challenges and opportunities for future research. TGB-Seq datasets, leaderboards, and example codes are available at https://tgb-seq.github.io/.

Paper Structure

This paper contains 18 sections, 3 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: The MRR scores of three selected temporal GNNs and SGNN-HN on two existing datasets (Wikipedia, Reddit) and two recommendation datasets (Yelp and Taobao).
  • Figure 2: The MRR scores of eight popular temporal GNNs for predicting repeated historical edges on four previously established datasets. "Unseen" denotes the performance of unseen edges.
  • Figure 3: Toy example of sequential dynamics in a temporal graph. The bipartite graph consists of users and items. The first user in group $u$, $u_0$, interacts sequentially with items $\{i_k\}_{k=0}^{k=4}$ at time $\{t_k\}_{k=0}^{k=4}$, respectively. Similarly, the first user in group $v$, $v_0$, interacts sequentially with items $\{i_k\}_{k=5}^{k=9}$ at the same timestamps $\{t_k\}_{k=0}^{k=4}$ as $u_0$. The second users, $u_1$ and $v_1$, follow a similar interaction pattern but interact with items at different times compared to the first users. All other users interact with items in a comparable sequential manner. A test sample queries whether the test node will interact with $i_4$ or $i_9$ at time $t_T$, based on its four previous interactions from $t_{T-4}$ to $t_{T-1}$.
  • Figure 4: The average training cost per epoch of eight popular temporal GNN methods on GoogleLocal, Patent, and Yelp datasets consists of 1.9M, 12.7M, and 19.7M edges, respectively.
  • Figure 5: Distribution of node degree on our TGB-Seq dataset.