FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Kezhao Huang; Haitian Jiang; Minjie Wang; Guangxuan Xiao; David Wipf; Xiang Song; Quan Gan; Zengfeng Huang; Jidong Zhai; Zheng Zhang

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Kezhao Huang, Haitian Jiang, Minjie Wang, Guangxuan Xiao, David Wipf, Xiang Song, Quan Gan, Zengfeng Huang, Jidong Zhai, Zheng Zhang

TL;DR

FreshGNN tackles the memory-access bottleneck in large-scale GNN training by introducing a selective historical embedding cache. It uses a gradient-based stability signal and a staleness threshold to admit embeddings into a bidirectional cache that also coordinates with static features, guided by a cost model to maximize IO savings. The system combines a cache-aware subgraph generator and an efficient data loader, supported by CSR2-based GPU pruning and asynchronous CPU sampling, achieving substantial speedups while preserving accuracy (within 1% of baselines) on web-scale graphs. Overall, FreshGNN demonstrates practical, scalable GNN training with major IO reductions and broad applicability, including heterogeneous graphs.

Abstract

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU. Due to limited GPU memory, expensive data movement is necessary to facilitate the storage of these features on alternative devices with slower access (e.g. CPU memory). Moreover, the irregularity of graph structures contributes to poor data locality which further exacerbates the problem. Consequently, existing frameworks capable of efficiently training large GNN models usually incur a significant accuracy degradation because of the currently-available shortcuts involved. To address these limitations, we instead propose FreshGNN, a general-purpose GNN mini-batch training framework that leverages a historical cache for storing and reusing GNN node embeddings instead of re-computing them through fetching raw features at every iteration. Critical to its success, the corresponding cache policy is designed, using a combination of gradient-based and staleness criteria, to selectively screen those embeddings which are relatively stable and can be cached, from those that need to be re-computed to reduce estimation errors and subsequent downstream accuracy loss. When paired with complementary system enhancements to support this selective historical cache, FreshGNN is able to accelerate the training speed on large graph datasets such as ogbn-papers100M and MAG240M by 3.4x up to 20.5x and reduce the memory access by 59%, with less than 1% influence on test accuracy.

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 19 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 2 equations, 19 figures, 3 tables, 1 algorithm.

Introduction
Background and Motivation
Graph Neural Networks
Difficulty in Training Large-Scale GNNs
Existing Mini-Batch Training Overhauls
Historical Embeddings in Full Graph Training
Design of FreshGNN
Historical Embedding Cache
Cache Policy for Accuracy
Caching Stable Embeddings
Evicting Stale Embeddings
Resulting Adaptive Cache Size
Avoiding Initial Instability
Cache Policy for System Performance
Cache-Aware Subgraph Generator
...and 10 more sections

Figures (19)

Figure 1: Average estimation error in one training epoch for GAS gas on ogbn-products.
Figure 2: Test accuracy of mini-batch training algorithms (ClusterGCN clustergcn, GNS gns, LADIES ladies, GraphFM graphfm, GAS gas, and MariusGNN mariusgnn) and full graph training with historical embedding (SANCUS sancus) compared with the target accuracy achieved by (expensive) neighbor sampling on: (a) relatively small ogbn-products graph where the gap is modest for most algorithms, and (b) larger ogbn-papers100M graph where the gap grows significantly.
Figure 3: Distribution of cosine similarity between embeddings at iteration $t$ and embeddings at iteration $t-s$ during the training of a GCN model on ogbn-arxiv. Here $s=20$.
Figure 4: Illustration of historical embedding cache using an example mini-batch graph.
Figure 5: FreshGNN system workflow.
...and 14 more figures

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

TL;DR

Abstract

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Authors

TL;DR

Abstract

Table of Contents

Figures (19)