Table of Contents
Fetching ...

FLASH: Flexible Learning of Adaptive Sampling from History in Temporal Graph Neural Networks

Or Feldman, Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Chaim Baskin, Moshe Eliasof

TL;DR

This work tackles the inefficiency and rigidity of historical-neighborhood sampling in temporal graph neural networks by introducing FLASH, a learnable, graph-adaptive sampling framework. FLASH learns to score historical neighbors using spatial-temporal embeddings and a link-aware context, selecting the top-$k$ neighbors via differentiable scoring and training with a self-supervised ranking objective that compares against uniform baselines. Theoretical results show FLASH strictly surpasses traditional heuristics in expressiveness, and extensive experiments across multiple TGNN backbones and dynamic-graph benchmarks demonstrate consistent improvements with manageable overhead. The proposed approach enables TGNNs to leverage long histories more effectively, enhancing future link prediction in dynamic graphs without requiring architectural changes to existing models.

Abstract

Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinders TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH.

FLASH: Flexible Learning of Adaptive Sampling from History in Temporal Graph Neural Networks

TL;DR

This work tackles the inefficiency and rigidity of historical-neighborhood sampling in temporal graph neural networks by introducing FLASH, a learnable, graph-adaptive sampling framework. FLASH learns to score historical neighbors using spatial-temporal embeddings and a link-aware context, selecting the top- neighbors via differentiable scoring and training with a self-supervised ranking objective that compares against uniform baselines. Theoretical results show FLASH strictly surpasses traditional heuristics in expressiveness, and extensive experiments across multiple TGNN backbones and dynamic-graph benchmarks demonstrate consistent improvements with manageable overhead. The proposed approach enables TGNNs to leverage long histories more effectively, enhancing future link prediction in dynamic graphs without requiring architectural changes to existing models.

Abstract

Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinders TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH.

Paper Structure

This paper contains 25 sections, 7 theorems, 15 equations, 4 figures, 12 tables.

Key Result

Theorem 1

For any $k$ there exists a dynamic graph on which any TGNN that apply $k$ recent selection cannot learn.

Figures (4)

  • Figure 1: Illustration of different neighborhood selection strategies for predicting a link between $v_i$ and $v_j$. Circles represent nodes and their colors indicate each node’s feature. One neighbor (matching $v_j$'s feature color) and a "bridge" neighbor (in yellow, connecting $v_i$ and $v_j$) are especially relevant. The bar chart on the right shows how each strategy scores these neighbors. Static heuristics (truncation or uniform sampling) either discard them or fail to prioritize them. By contrast, FLASH adaptively assigns higher scores to these key neighbors.
  • Figure 2: Overview of FLASH. Each historical neighbor $u$ is assigned a relevance score based on its temporal, spatial, and structural relationships with $v_i$ and $v_j$. The highest-scoring neighbors are selected via differentiable sampling.
  • Figure 3: FLASH vs. Truncation baseline on the TGB benchmark. Results are reported in MRR (for dynamic link prediction) with random negative sampling over three different runs, using $k=4$ and $k=8$ historical neighbors.
  • Figure 4: Impact of increasing sampled neighbors on MOOC (left) vs. SocialEvo (right). The gap between the performance of Truncation and FLASH when using 2 neighbors for SocialEvo is bolded with light red. As we increase the number of sampled neighbors, the gap is shrinking.

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • proof
  • proof
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • proof
  • ...and 3 more