Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours
Yuxin Yang, Hongkuan Zhou, Rajgopal Kannan, Viktor Prasanna
TL;DR
Temporal Graph Neural Networks (TGNNs) face a large design space and challenging runtime constraints. The authors propose a modular, optimized benchmarking framework that evaluates TGNN components (neighbor sampling, node memory, and neighbor aggregator) within a unified code base, spending over 10,000 GPU hours across seven datasets. They find that most recent neighbor sampling and attention-based aggregators outperform older choices; static node memory is effective, and memory effectiveness depends on dataset repetition patterns, with RNN memory favored for short-term repetition and embedding memory for long-term repetition. The work highlights meaningful interactions between modules, shows limited gains from deeper sampling when memory is present, and provides practical guidance for designing more general and efficient TGNNs.
Abstract
Temporal Graph Neural Networks (TGNNs) have emerged as powerful tools for modeling dynamic interactions across various domains. The design space of TGNNs is notably complex, given the unique challenges in runtime efficiency and scalability raised by the evolving nature of temporal graphs. We contend that many of the existing works on TGNN modeling inadequately explore the design space, leading to suboptimal designs. Viewing TGNN models through a performance-focused lens often obstructs a deeper understanding of the advantages and disadvantages of each technique. Specifically, benchmarking efforts inherently evaluate models in their original designs and implementations, resulting in unclear accuracy comparisons and misleading runtime. To address these shortcomings, we propose a practical comparative evaluation framework that performs a design space search across well-known TGNN modules based on a unified, optimized code implementation. Using our framework, we make the first efforts towards addressing three critical questions in TGNN design, spending over 10,000 GPU hours: (1) investigating the efficiency of TGNN module designs, (2) analyzing how the effectiveness of these modules correlates with dataset patterns, and (3) exploring the interplay between multiple modules. Key outcomes of this directed investigative approach include demonstrating that the most recent neighbor sampling and attention aggregator outperform uniform neighbor sampling and MLP-Mixer aggregator; Assessing static node memory as an effective node memory alternative, and showing that the choice between static or dynamic node memory should be based on the repetition patterns in the dataset. Our in-depth analysis of the interplay between TGNN modules and dataset patterns should provide a deeper insight into TGNN performance along with potential research directions for designing more general and effective TGNNs.
