Table of Contents
Fetching ...

Transfer Learning for Temporal Link Prediction

Ayan Chatterjee, Barbara Ikica, Babak Ravandi, John Palowitch

TL;DR

The paper addresses transferability in temporal link prediction by showing that memory-centric Temporal Graph Networks poorly generalize to entirely new graphs. It introduces a structural mapping approach that learns to translate graph topological features into memory embeddings, enabling zero-shot initialization of unseen nodes. Across Temporal Graph Benchmark datasets, the structural map can achieve deployment performance comparable to or better than fine-tuning, reducing the need for test-time adaptation. Limitations such as divergent loss dynamics and seed sensitivity are discussed, with future work proposed on richer topological signals and broader cross-graph transfer.

Abstract

Link prediction on graphs has applications spanning from recommender systems to drug discovery. Temporal link prediction (TLP) refers to predicting future links in a temporally evolving graph and adds additional complexity related to the dynamic nature of graphs. State-of-the-art TLP models incorporate memory modules alongside graph neural networks to learn both the temporal mechanisms of incoming nodes and the evolving graph topology. However, memory modules only store information about nodes seen at train time, and hence such models cannot be directly transferred to entirely new graphs at test time and deployment. In this work, we study a new transfer learning task for temporal link prediction, and develop transfer-effective methods for memory-laden models. Specifically, motivated by work showing the informativeness of structural signals for the TLP task, we augment a structural mapping module to the existing TLP model architectures, which learns a mapping from graph structural (topological) features to memory embeddings. Our work paves the way for a memory-free foundation model for TLP.

Transfer Learning for Temporal Link Prediction

TL;DR

The paper addresses transferability in temporal link prediction by showing that memory-centric Temporal Graph Networks poorly generalize to entirely new graphs. It introduces a structural mapping approach that learns to translate graph topological features into memory embeddings, enabling zero-shot initialization of unseen nodes. Across Temporal Graph Benchmark datasets, the structural map can achieve deployment performance comparable to or better than fine-tuning, reducing the need for test-time adaptation. Limitations such as divergent loss dynamics and seed sensitivity are discussed, with future work proposed on richer topological signals and broader cross-graph transfer.

Abstract

Link prediction on graphs has applications spanning from recommender systems to drug discovery. Temporal link prediction (TLP) refers to predicting future links in a temporally evolving graph and adds additional complexity related to the dynamic nature of graphs. State-of-the-art TLP models incorporate memory modules alongside graph neural networks to learn both the temporal mechanisms of incoming nodes and the evolving graph topology. However, memory modules only store information about nodes seen at train time, and hence such models cannot be directly transferred to entirely new graphs at test time and deployment. In this work, we study a new transfer learning task for temporal link prediction, and develop transfer-effective methods for memory-laden models. Specifically, motivated by work showing the informativeness of structural signals for the TLP task, we augment a structural mapping module to the existing TLP model architectures, which learns a mapping from graph structural (topological) features to memory embeddings. Our work paves the way for a memory-free foundation model for TLP.

Paper Structure

This paper contains 21 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: The majority of the trained parameters in TGN pertain to the training graph and are not transferable.(a) Representation of different components of the TGN model trained on tgbl-comment dataset. Here, blue circles are trainable, whereas the other blocks are intermediate state embeddings. We observe that the majority of the variable parameters ($\sim1M$) are associated with the dataset, and only a small fraction of parameters ($\sim150k$) pertain to the model architecture. Hence, the trained TGN model is highly specific to the training data and inherently is less transferable. (b) Transfer learning task for temporal link prediction. We want to train a TLP model on $G$, a community of the larger network, and transfer it to a disjoint community $G'$.
  • Figure 2: Transfer learning framework and structural mapping overview.(a) We train and validate the TLP model on the training graph $G$. The validation set is used for early-stopping of the model training. During deployment, we learn the memory embeddings of a fraction of the unseen nodes in $G'$ by fine-tuning the model on a fraction of $G'$. This model is thereafter used to derive the test performance on the remaining of $G'$. (b) We learn a structural map during training on $G$. This map learns a function to map the graph topological features to memory embeddings, and when a new node is encountered in the test, we use this learned mapping to initialize the memory embeddings from deterministic topological features.
  • Figure 3: Temporal aggregation and structural map architecture(a) For each newly observed node in test, we aggregate the past edges to construct an aggregated graph, which is used for computing the topological features of the newly incoming node. We use $1\%$ of the time-span of the benchmark datasets as an aggregation window in our experiments. (b) Overview of the structural map-augmented TGN architecture. We use a 3-layer perceptron (MLP) as the structural map module. We combine the loss from the link prediction decoder and the structural map module and train the model in an end-to-end fashion.
  • Figure 4: Fine-tuning on a fraction of the test graph improves loss during deployment. We observe lower loss when fine-tuning is implemented in TGN during transfer learning. The improvement achieved by fine-tuning is consistent across multiple benchmark datasets, including tgbl-wiki, tgbl-review, and tgbl-flight. Here, metric value refers to the total loss of the TGN model.
  • Figure 5: TGN with structural mapping improves transfer loss during deployment on an unseen temporal graph. The tgbl-flight dataset is tested in different transfer learning scenarios. TGN, TGN with fine-tuning, and TGN Structural Map are used in this study. The train and validation graphs pertain the flights and the airports contained in a certain continent, whereas the test graphs are derived from a continent different from the ones used in train and validation. This ensures that the airports trained and validated on are disjoint from the airport tested on. We show that TGN Structural Map can achieve similar or lower model loss compared to the fine-tuned version on the test dataset. The observations are consistent across 4 different transfer learning scenarios.
  • ...and 2 more figures