Table of Contents
Fetching ...

From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph Learning

Moritz Lampert, Christopher Blöcker, Ingo Scholtes

TL;DR

This work critically examines batch-based evaluation in dynamic temporal graphs and shows that fixed-size batches cause information loss, information leakage, and inconsistent task difficulty across time windows, especially in continuous-time data. It reframes dynamic link prediction as dynamic link forecasting using fixed-duration time windows, mitigating leakage and aligning evaluation with real-world temporal patterns. Across 14 continuous-time and 6 discrete-time real-world datasets, the authors demonstrate substantial performance shifts between forecasting and prediction for state-of-the-art TGNNs, notably memory-based models, and provide practical implementations ( SnapshotLoader, DyGLib extensions) to facilitate adoption. The study contributes a principled evaluation paradigm that yields fairer comparisons and more realistic assessments of model capabilities in temporal graphs, with implications for both research and applied deployments.

Abstract

Dynamic link prediction is an important problem often considered in recent works proposing various approaches for learning temporal edge patterns. To assess their efficacy, models are evaluated on benchmark datasets involving continuous-time and discrete-time temporal graphs. However, as we show in this work, the suitability of common batch-oriented evaluation depends on the datasets' characteristics, which can cause multiple issues: For continuous-time temporal graphs, fixed-size batches create time windows with different durations, resulting in an inconsistent dynamic link prediction task. For discrete-time temporal graphs, the sequence of batches can additionally introduce temporal dependencies that are not present in the data. In this work, we empirically show that this common evaluation approach leads to skewed model performance and hinders the fair comparison of methods. We mitigate this problem by reformulating dynamic link prediction as a link forecasting task that better accounts for temporal information present in the data.

From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph Learning

TL;DR

This work critically examines batch-based evaluation in dynamic temporal graphs and shows that fixed-size batches cause information loss, information leakage, and inconsistent task difficulty across time windows, especially in continuous-time data. It reframes dynamic link prediction as dynamic link forecasting using fixed-duration time windows, mitigating leakage and aligning evaluation with real-world temporal patterns. Across 14 continuous-time and 6 discrete-time real-world datasets, the authors demonstrate substantial performance shifts between forecasting and prediction for state-of-the-art TGNNs, notably memory-based models, and provide practical implementations ( SnapshotLoader, DyGLib extensions) to facilitate adoption. The study contributes a principled evaluation paradigm that yields fairer comparisons and more realistic assessments of model capabilities in temporal graphs, with implications for both research and applied deployments.

Abstract

Dynamic link prediction is an important problem often considered in recent works proposing various approaches for learning temporal edge patterns. To assess their efficacy, models are evaluated on benchmark datasets involving continuous-time and discrete-time temporal graphs. However, as we show in this work, the suitability of common batch-oriented evaluation depends on the datasets' characteristics, which can cause multiple issues: For continuous-time temporal graphs, fixed-size batches create time windows with different durations, resulting in an inconsistent dynamic link prediction task. For discrete-time temporal graphs, the sequence of batches can additionally introduce temporal dependencies that are not present in the data. In this work, we empirically show that this common evaluation approach leads to skewed model performance and hinders the fair comparison of methods. We mitigate this problem by reformulating dynamic link prediction as a link forecasting task that better accounts for temporal information present in the data.
Paper Structure (41 sections, 9 equations, 13 figures, 18 tables)

This paper contains 41 sections, 9 equations, 13 figures, 18 tables.

Figures (13)

  • Figure 1: Illustration of the issues in batch-based evaluation: (1) Information loss: continuous-time temporal edges are grouped into batches, causing some TGNNs to discard their temporal ordering. (2) Information leakage: discrete-time temporal edges that belong to the same snapshot are assigned to different batches, imposing an artificial ordering between edges that belong to the same snapshot. (3) Varying time window durations: using a fixed batch size creates time windows of different lengths when temporal edges are inhomogeneously distributed. (4) Tunable batch size: when different models choose different batch sizes, they group the temporal edges into different time windows, thus making their results incomparable because the specifics of their prediction tasks differ.
  • Figure 2: NMI (y-axis) measures the temporal information loss as a function of different batch sizes (x-axis), where smaller NMI values indicate more information loss.
  • Figure 3: Real-world datasets exhibit diverse edge occurrence patterns that are visualised using the edge density across time, i.e., histograms of three continuous- and one discrete-time temporal graphs counting the number of edges per timestamp. Dashed lines divide the datasets into 70% train, 15% validation, and 15% test sets as used in \ref{['sec:experiments']}. The other datasets' histograms are shown in \ref{['appx:edge_activity_patterns']}.
  • Figure 4: Model performance of selected continuous-time datasets varies substantially over time which is visualised by the AUC-ROC scores for each batch and time window individually. For both datasets, the best-performing memory-based and non-memory-based model is shown. The time (x-axis) is limited to the test set. The scores for other models and datasets are reported in \ref{['app:AUC_over_time']}.
  • Figure 5: Real-world datasets exhibit diverse edge occurrence patterns that are visualised using the edge density across time, i.e., histograms counting the number of edges per timestamp. Dashed lines divide the datasets into 70% train, 15% validation, and 15% test sets as used in \ref{['sec:experiments']}.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Definition 2.1
  • Definition 3.1
  • Definition 2.1: Mutual Information
  • Example 2.1