DyG2Vec: Efficient Representation Learning for Dynamic Graphs

Mohammad Ali Alomrani; Mahdi Biparva; Yingxue Zhang; Mark Coates

DyG2Vec: Efficient Representation Learning for Dynamic Graphs

Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang, Mark Coates

TL;DR

DyG2Vec tackles inefficiency in dynamic-graph representation by introducing a window-based, attention-driven encoder with temporal edge encodings, enabling task-agnostic node embeddings. It adds a non-contrastive SSL objective for pre-training on unlabeled dynamic graphs, followed by downstream training with a fixed history window. Empirically, it achieves state-of-the-art performance on seven real-world benchmarks for future link prediction (about 4.23% transductive and 3.30% inductive gains) while delivering 5–10x faster training and inference, with SSL providing additional gains in low-label regimes. These results demonstrate scalable, robust learning of temporal patterns and motifs in large CTDGs, highlighting the value of window-based priors and SSL in dynamic graph modeling.

Abstract

Temporal graph neural networks have shown promising results in learning inductive representations by automatically extracting temporal patterns. However, previous works often rely on complex memory modules or inefficient random walk methods to construct temporal representations. To address these limitations, we present an efficient yet effective attention-based encoder that leverages temporal edge encodings and window-based subgraph sampling to generate task-agnostic embeddings. Moreover, we propose a joint-embedding architecture using non-contrastive SSL to learn rich temporal embeddings without labels. Experimental results on 7 benchmark datasets indicate that on average, our model outperforms SoTA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies. The code is publicly available at https://github.com/huawei-noah/noah-research/tree/master/graph_atlas.

DyG2Vec: Efficient Representation Learning for Dynamic Graphs

TL;DR

Abstract

Paper Structure (26 sections, 7 equations, 6 figures, 12 tables)

This paper contains 26 sections, 7 equations, 6 figures, 12 tables.

Introduction
Related Work
Problem formulation
Methodology
DyG2Vec Encoding Model
DyG2Vec Downstream Training
Self-supervised Pre-training for Dynamic Graphs
Experimental Evaluation
Experimental Setup
Experimental Results
Ablation and Sensitivity Analysis
Analysis
Conclusion
Appendix
Preliminary: VICReg
...and 11 more sections

Figures (6)

Figure 1: Using DyG2Vec window framework to encode the target node $u$. Every slice of the dynamic graph $\mathcal{G}$ contains edges that arrived at the same continuous timestamp. The blue interval represents the history graph $\mathcal{G}_{i - W, i}$ that is encoded to make a prediction on the target edge $(u, v)$. Note that both $u$ and $v$ share the same sampled history graph. For simplicity, we omit edge features $m_p$ from the attention encoder.
Figure 2: The joint embedding architecture for the non-contrastive SSL Framework. Each slice of the input dynamic graph contains edges arriving at the same continuous timestamp. $B$ is a batch of intervals of size $W$. $\hat{\mathcal{G}}$ is a batch of the corresponding input graphs of each interval.
Figure 3: Transductive FLP Performance (Test AP) vs Inference runtime (s) on 3 datasets. Inference time represents the time it takes to predict the whole test set. The test sets are approximately of size 400K, 600K, and 100K edges respectively.
Figure 4: First figure plots Semi-Supervised Learning results on Dynamic Node Classification. For each setting, DyG2Vec was trained on a varying random portion of the training data. Second figure plots the Average Attention Weight versus Relative Timespan for DyG2Vec trained with $W=64K$. The relative timespan is normalized with the maximum timespan across all interactions. A higher timespan means a farther interaction.
Figure 5: Ablation, sensitivity, and attention analysis on 3 datasets for the FLP transductive task.
...and 1 more figures

DyG2Vec: Efficient Representation Learning for Dynamic Graphs

TL;DR

Abstract

DyG2Vec: Efficient Representation Learning for Dynamic Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)