Table of Contents
Fetching ...

Robust Training of Temporal GNNs using Nearest Neighbours based Hard Negatives

Shubham Gupta, Srikanta Bedathur

TL;DR

This work targets a suboptimal aspect of training temporal graph neural networks (Tgnn): uniform negative sampling often yields uninformative negatives that hinder model convergence for future-link prediction. It develops a theoretical framework linking negative sampling to gradient variance and proposes a dynamic, top-K hard-negative sampling strategy derived from embedding similarities and loss-based proxies, implemented with periodic embedding refresh and caching to manage overhead. Empirical results on Wikipedia Edits, Reddit Posts, and Twitter Retweets show consistent improvements over standard TGNN training and heuristic baselines, with the best gains achieved when combining uniform and hard negatives in a hybrid setup. The approach is practical for real-world temporal graphs, offering a principled, scalable method to boost recommendation performance without changing inference-time behavior.

Abstract

Temporal graph neural networks Tgnn have exhibited state-of-art performance in future-link prediction tasks. Training of these TGNNs is enumerated by uniform random sampling based unsupervised loss. During training, in the context of a positive example, the loss is computed over uninformative negatives, which introduces redundancy and sub-optimal performance. In this paper, we propose modified unsupervised learning of Tgnn, by replacing the uniform negative sampling with importance-based negative sampling. We theoretically motivate and define the dynamically computed distribution for a sampling of negative examples. Finally, using empirical evaluations over three real-world datasets, we show that Tgnn trained using loss based on proposed negative sampling provides consistent superior performance.

Robust Training of Temporal GNNs using Nearest Neighbours based Hard Negatives

TL;DR

This work targets a suboptimal aspect of training temporal graph neural networks (Tgnn): uniform negative sampling often yields uninformative negatives that hinder model convergence for future-link prediction. It develops a theoretical framework linking negative sampling to gradient variance and proposes a dynamic, top-K hard-negative sampling strategy derived from embedding similarities and loss-based proxies, implemented with periodic embedding refresh and caching to manage overhead. Empirical results on Wikipedia Edits, Reddit Posts, and Twitter Retweets show consistent improvements over standard TGNN training and heuristic baselines, with the best gains achieved when combining uniform and hard negatives in a hybrid setup. The approach is practical for real-world temporal graphs, offering a principled, scalable method to boost recommendation performance without changing inference-time behavior.

Abstract

Temporal graph neural networks Tgnn have exhibited state-of-art performance in future-link prediction tasks. Training of these TGNNs is enumerated by uniform random sampling based unsupervised loss. During training, in the context of a positive example, the loss is computed over uninformative negatives, which introduces redundancy and sub-optimal performance. In this paper, we propose modified unsupervised learning of Tgnn, by replacing the uniform negative sampling with importance-based negative sampling. We theoretically motivate and define the dynamically computed distribution for a sampling of negative examples. Finally, using empirical evaluations over three real-world datasets, we show that Tgnn trained using loss based on proposed negative sampling provides consistent superior performance.
Paper Structure (16 sections, 20 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 20 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: 2D T-SNE tsne representation of node embeddings for various interactions between source and target node pairs. These representations are computed at time $t$ just before the interactions. It is clearly seen that learned temporal embeddings for past nodes and target nodes are nearby, and often target node is farther than past nodes from the source nodes. Also, the representation of a few random nodes is closer to the source node than the target node. This results in sub-optimal performance of Tgnn in recommendation tasks.
  • Figure 2: Performance of proposed method when increasing the index frequency refresh period(P)
  • Figure 3: Performance of proposed method when increasing the top-k in negative sampling
  • Figure 4: Performance variation when varying the # of hard negative samples during training tgn on Wikipedia dataset
  • Figure 5: Influence of varying learning rate on TGN performance on Wikipedia dataset
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1: Continuous time temporal graph
  • Definition 2: Temporal Neighborhood