MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

Guangming Sheng; Junwei Su; Chao Huang; Chuan Wu

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

Guangming Sheng, Junwei Su, Chao Huang, Chuan Wu

TL;DR

An online pipeline scheduling algorithm is introduced in MSPipe that strategically breaks temporal dependencies with minimal staleness and delays memory fetching to obtain fresher memory states, making it a promising solution for efficient MTGNN training.

Abstract

Memory-based Temporal Graph Neural Networks (MTGNNs) are a class of temporal graph neural networks that utilize a node memory module to capture and retain long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, the iterative reading and updating process of the memory module in MTGNNs to obtain up-to-date information needs to follow the temporal dependencies. This introduces significant overhead and limits training throughput. Existing optimizations for static GNNs are not directly applicable to MTGNNs due to differences in training paradigm, model architecture, and the absence of a memory module. Moreover, they do not effectively address the challenges posed by temporal dependencies, making them ineffective for MTGNN training. In this paper, we propose MSPipe, a general and efficient framework for MTGNNs that maximizes training throughput while maintaining model accuracy. Our design addresses the unique challenges associated with fetching and updating node memory states in MTGNNs by integrating staleness into the memory module. However, simply introducing a predefined staleness bound in the memory module to break temporal dependencies may lead to suboptimal performance and lack of generalizability across different models and datasets. To solve this, we introduce an online pipeline scheduling algorithm in MSPipe that strategically breaks temporal dependencies with minimal staleness and delays memory fetching to obtain fresher memory states. Moreover, we design a staleness mitigation mechanism to enhance training convergence and model accuracy. We provide convergence analysis and prove that MSPipe maintains the same convergence rate as vanilla sample-based GNN training. Experimental results show that MSPipe achieves up to 2.45x speed-up without sacrificing accuracy, making it a promising solution for efficient MTGNN training.

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

TL;DR

Abstract

Paper Structure (41 sections, 3 theorems, 23 equations, 23 figures, 11 tables, 1 algorithm)

This paper contains 41 sections, 3 theorems, 23 equations, 23 figures, 11 tables, 1 algorithm.

Introduction
Preliminary
MSPipe framework
MSPipe mechanism
Stall-free Minimal-staleness Pipeline
Similarity-based Staleness mitigation
Theoretical Analysis
Experiments
Experiment settings
Expedited Training While Maintaining Accuracy
Preserving Convergence Rate
Stall-free Minimal Staleness Bound
Staleness Mitigation Mechanism
GPU memory and utilization
Related Works
...and 26 more sections

Key Result

theorem 1

With a memory-based TGNN model, suppose that 1) there is a bounded difference between the stale node memory vector $\Tilde{s}^{(i)}_{v}$ and the exact node memory vector $s^{(i)}_{v}$ with the staleness bound $\epsilon_s$, i.e., $\Vert \Tilde{s}^{(i)}_{v} - s^{(i)}_{v} \Vert_{F} \leq \epsilon_s$ w where $W_{0}$, $W_t$ and $W^\ast$ are the initial, step-t and optimal model parameters, respectivel

Figures (23)

Figure 1: Memory-based TGNN training. (a) represents the general training scheme; (b) shows the pre-sampling and pre-fetching optimization; (c) is the case of breaking the temporal dependency, where the TGNN training stage is executed uninterruptedly.
Figure 2: Memory-based TGNN Training Stages. The node memory states are stored in the CPU memory to ensure consistency among multiple training workers and reduce GPU memory contention. The MTGNN model is stored in the GPU.
Figure 3: Pipeline execution. The dashed black arrow represents the bubble time. The red arrow denotes memory fetching to retrieve memory vectors updated $k$ iterations before.
Figure 4: Model accuracy and training throughput at different staleness bounds.
Figure 5: Different resource requirements (by color/shape) of 5 training stages.
...and 18 more figures

Theorems & Definitions (3)

theorem 1: Convergent result, informal
lemma 1
lemma 2

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

TL;DR

Abstract

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (3)