TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

Gangda Deng; Hongkuan Zhou; Hanqing Zeng; Yinglong Xia; Christopher Leung; Jianbo Li; Rajgopal Kannan; Viktor Prasanna

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor Prasanna

TL;DR

This work proposes TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability, and implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache.

Abstract

Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

TL;DR

Abstract

Paper Structure (18 sections, 16 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 18 sections, 16 equations, 4 figures, 3 tables, 3 algorithms.

Introduction
Background
Neighbor Finder
Temporal Aggregator
Related Works
Approach
Temporal Adaptive Mini-batch Selection
Temporal Adaptive Neighbor Sampling
GPU Temporal Neighbor Finding
GPU Feature Caching
Experiments
Experimental Setup
Accuracy
Runtime
GPU Neighbor Finder
...and 3 more sections

Figures (4)

Figure 1: Runtime (per epoch) breakdown for TGAT with different numbers of neighbors per layer. Prep. refers to the mini-batch generation time (neighbor finding, feature slicing, and CPU-GPU data transferring), while Prop. refers to the propagation time (forward and backward propagation).
Figure 2: One training iteration of TASER on a one-layer TGNN. (a) Randomly select a set of mini-batch samples based on the pre-computed importance score $\mathcal{P}$ proportional to the logits (temporal adaptive mini-batch selection). (b) Sample a subset of neighbors from the temporal neighborhood using our GPU temporal neighbor finder. (c) Slice the features of sampled neighbors from the VRAM cache and RAM. (d) Apply temporal adaptive neighbor sampling (parameterized by $\theta$) to sub-sample the supporting neighbors for TGNN by encoding timestamps, frequencies, and identities along with features. (e) Perform forward and backward propagation. Update the importance score $\mathcal{P}$ for adaptive mini-batch selection and back-propagate through the model loss and sample loss to train the TGNN model and temporal adaptive sampler.
Figure 3: (a) Total sampling time per epoch of a $2$-layer TGAT with different neighbor finders and different numbers of neighbors per layer. (b) Cache Hit Rate of TASER caching strategy and Oracle caching strategy with different training epochs.
Figure 4: Test MRR of (a) TGAT and (b) GraphMixer with TASER on the Wikipedia dataset. $m$ and $n$ denote the numbers of neighbors selected by the neighbor finder and the adaptive neighbor sampler, respectively.

Theorems & Definitions (1)

Remark

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

TL;DR

Abstract

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)