Robust Training of Temporal GNNs using Nearest Neighbours based Hard Negatives
Shubham Gupta, Srikanta Bedathur
TL;DR
This work targets a suboptimal aspect of training temporal graph neural networks (Tgnn): uniform negative sampling often yields uninformative negatives that hinder model convergence for future-link prediction. It develops a theoretical framework linking negative sampling to gradient variance and proposes a dynamic, top-K hard-negative sampling strategy derived from embedding similarities and loss-based proxies, implemented with periodic embedding refresh and caching to manage overhead. Empirical results on Wikipedia Edits, Reddit Posts, and Twitter Retweets show consistent improvements over standard TGNN training and heuristic baselines, with the best gains achieved when combining uniform and hard negatives in a hybrid setup. The approach is practical for real-world temporal graphs, offering a principled, scalable method to boost recommendation performance without changing inference-time behavior.
Abstract
Temporal graph neural networks Tgnn have exhibited state-of-art performance in future-link prediction tasks. Training of these TGNNs is enumerated by uniform random sampling based unsupervised loss. During training, in the context of a positive example, the loss is computed over uninformative negatives, which introduces redundancy and sub-optimal performance. In this paper, we propose modified unsupervised learning of Tgnn, by replacing the uniform negative sampling with importance-based negative sampling. We theoretically motivate and define the dynamically computed distribution for a sampling of negative examples. Finally, using empirical evaluations over three real-world datasets, we show that Tgnn trained using loss based on proposed negative sampling provides consistent superior performance.
