Table of Contents
Fetching ...

Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours

Yuxin Yang, Hongkuan Zhou, Rajgopal Kannan, Viktor Prasanna

TL;DR

Temporal Graph Neural Networks (TGNNs) face a large design space and challenging runtime constraints. The authors propose a modular, optimized benchmarking framework that evaluates TGNN components (neighbor sampling, node memory, and neighbor aggregator) within a unified code base, spending over 10,000 GPU hours across seven datasets. They find that most recent neighbor sampling and attention-based aggregators outperform older choices; static node memory is effective, and memory effectiveness depends on dataset repetition patterns, with RNN memory favored for short-term repetition and embedding memory for long-term repetition. The work highlights meaningful interactions between modules, shows limited gains from deeper sampling when memory is present, and provides practical guidance for designing more general and efficient TGNNs.

Abstract

Temporal Graph Neural Networks (TGNNs) have emerged as powerful tools for modeling dynamic interactions across various domains. The design space of TGNNs is notably complex, given the unique challenges in runtime efficiency and scalability raised by the evolving nature of temporal graphs. We contend that many of the existing works on TGNN modeling inadequately explore the design space, leading to suboptimal designs. Viewing TGNN models through a performance-focused lens often obstructs a deeper understanding of the advantages and disadvantages of each technique. Specifically, benchmarking efforts inherently evaluate models in their original designs and implementations, resulting in unclear accuracy comparisons and misleading runtime. To address these shortcomings, we propose a practical comparative evaluation framework that performs a design space search across well-known TGNN modules based on a unified, optimized code implementation. Using our framework, we make the first efforts towards addressing three critical questions in TGNN design, spending over 10,000 GPU hours: (1) investigating the efficiency of TGNN module designs, (2) analyzing how the effectiveness of these modules correlates with dataset patterns, and (3) exploring the interplay between multiple modules. Key outcomes of this directed investigative approach include demonstrating that the most recent neighbor sampling and attention aggregator outperform uniform neighbor sampling and MLP-Mixer aggregator; Assessing static node memory as an effective node memory alternative, and showing that the choice between static or dynamic node memory should be based on the repetition patterns in the dataset. Our in-depth analysis of the interplay between TGNN modules and dataset patterns should provide a deeper insight into TGNN performance along with potential research directions for designing more general and effective TGNNs.

Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours

TL;DR

Temporal Graph Neural Networks (TGNNs) face a large design space and challenging runtime constraints. The authors propose a modular, optimized benchmarking framework that evaluates TGNN components (neighbor sampling, node memory, and neighbor aggregator) within a unified code base, spending over 10,000 GPU hours across seven datasets. They find that most recent neighbor sampling and attention-based aggregators outperform older choices; static node memory is effective, and memory effectiveness depends on dataset repetition patterns, with RNN memory favored for short-term repetition and embedding memory for long-term repetition. The work highlights meaningful interactions between modules, shows limited gains from deeper sampling when memory is present, and provides practical guidance for designing more general and efficient TGNNs.

Abstract

Temporal Graph Neural Networks (TGNNs) have emerged as powerful tools for modeling dynamic interactions across various domains. The design space of TGNNs is notably complex, given the unique challenges in runtime efficiency and scalability raised by the evolving nature of temporal graphs. We contend that many of the existing works on TGNN modeling inadequately explore the design space, leading to suboptimal designs. Viewing TGNN models through a performance-focused lens often obstructs a deeper understanding of the advantages and disadvantages of each technique. Specifically, benchmarking efforts inherently evaluate models in their original designs and implementations, resulting in unclear accuracy comparisons and misleading runtime. To address these shortcomings, we propose a practical comparative evaluation framework that performs a design space search across well-known TGNN modules based on a unified, optimized code implementation. Using our framework, we make the first efforts towards addressing three critical questions in TGNN design, spending over 10,000 GPU hours: (1) investigating the efficiency of TGNN module designs, (2) analyzing how the effectiveness of these modules correlates with dataset patterns, and (3) exploring the interplay between multiple modules. Key outcomes of this directed investigative approach include demonstrating that the most recent neighbor sampling and attention aggregator outperform uniform neighbor sampling and MLP-Mixer aggregator; Assessing static node memory as an effective node memory alternative, and showing that the choice between static or dynamic node memory should be based on the repetition patterns in the dataset. Our in-depth analysis of the interplay between TGNN modules and dataset patterns should provide a deeper insight into TGNN performance along with potential research directions for designing more general and effective TGNNs.
Paper Structure (24 sections, 7 equations, 12 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 7 equations, 12 figures, 4 tables, 2 algorithms.

Figures (12)

  • Figure 1: Overview of our generalized update-sampling-aggregation TGNN pipeline. We focus our discussion on the three modules in grey rectangles: node memory, neighbor sampling, and neighbor aggregator. We use small circles to represent nodes and rectangles to represent edge embeddings. Arrows indicate the flow of data, encompassing transformations and rearrangements. In the neighbor sampling module, dotted dark grey rectangles represent dummy nodes where adequate valid neighbors are unavailable. The circled numbers illustrate the sequence of task execution, determined by task dependencies. The same numbers represent that the tasks can be executed parallelly without dependency issues.
  • Figure 2: Performance gap between two neighbor samplers across various module combinations on datasets WIKI and REDDIT. The values are calculated by subtracting the prediction accuracy of models using uniform sampling from the prediction accuracy of models using the most recent sampling. Results are plotted with varying numbers of sampled neighbors for 1-layer models.
  • Figure 3: Performance of different combinations of TGNN modules on dataset Wikipedia and REDDIT. As more neighbors are sampled in models with one layer (first row) and two layers (second row), certain models display enhanced accuracy, while the top-performing model exhibits fast saturation.
  • Figure 4: MRRs and runtimes of 1-layer models sampling an increasing number of neighbors on datasets UCI, Wikipedia, REDDIT, and Flights. We use atten neighbor aggregator and MR sampling on all datasets. For node memory, we adopt the better-performing ones on each dataset: emb memory on REDDIT and Flights, and RNN memory on other datasets. MRR is shown by line chart and runtime is shown by bar chart in logarithmic scale. Note that the x-axis uses a scale that has been slightly adjusted to ensure visual tidiness.
  • Figure 5: MRR performance of RNN-based and embedding table-based node memory of all datasets. We fix the neighbor sampling to the most recent strategy and the neighbor aggregator to the attention aggregator. 1-layer models are used with 5-100 neighbors on all datasets. The red diagonal dashed line shows equal performance by RNN-based and embedding table-based node memory.
  • ...and 7 more figures