Table of Contents
Fetching ...

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Kezhao Huang, Haitian Jiang, Minjie Wang, Guangxuan Xiao, David Wipf, Xiang Song, Quan Gan, Zengfeng Huang, Jidong Zhai, Zheng Zhang

TL;DR

FreshGNN tackles the memory-access bottleneck in large-scale GNN training by introducing a selective historical embedding cache. It uses a gradient-based stability signal and a staleness threshold to admit embeddings into a bidirectional cache that also coordinates with static features, guided by a cost model to maximize IO savings. The system combines a cache-aware subgraph generator and an efficient data loader, supported by CSR2-based GPU pruning and asynchronous CPU sampling, achieving substantial speedups while preserving accuracy (within 1% of baselines) on web-scale graphs. Overall, FreshGNN demonstrates practical, scalable GNN training with major IO reductions and broad applicability, including heterogeneous graphs.

Abstract

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU. Due to limited GPU memory, expensive data movement is necessary to facilitate the storage of these features on alternative devices with slower access (e.g. CPU memory). Moreover, the irregularity of graph structures contributes to poor data locality which further exacerbates the problem. Consequently, existing frameworks capable of efficiently training large GNN models usually incur a significant accuracy degradation because of the currently-available shortcuts involved. To address these limitations, we instead propose FreshGNN, a general-purpose GNN mini-batch training framework that leverages a historical cache for storing and reusing GNN node embeddings instead of re-computing them through fetching raw features at every iteration. Critical to its success, the corresponding cache policy is designed, using a combination of gradient-based and staleness criteria, to selectively screen those embeddings which are relatively stable and can be cached, from those that need to be re-computed to reduce estimation errors and subsequent downstream accuracy loss. When paired with complementary system enhancements to support this selective historical cache, FreshGNN is able to accelerate the training speed on large graph datasets such as ogbn-papers100M and MAG240M by 3.4x up to 20.5x and reduce the memory access by 59%, with less than 1% influence on test accuracy.

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

TL;DR

FreshGNN tackles the memory-access bottleneck in large-scale GNN training by introducing a selective historical embedding cache. It uses a gradient-based stability signal and a staleness threshold to admit embeddings into a bidirectional cache that also coordinates with static features, guided by a cost model to maximize IO savings. The system combines a cache-aware subgraph generator and an efficient data loader, supported by CSR2-based GPU pruning and asynchronous CPU sampling, achieving substantial speedups while preserving accuracy (within 1% of baselines) on web-scale graphs. Overall, FreshGNN demonstrates practical, scalable GNN training with major IO reductions and broad applicability, including heterogeneous graphs.

Abstract

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU. Due to limited GPU memory, expensive data movement is necessary to facilitate the storage of these features on alternative devices with slower access (e.g. CPU memory). Moreover, the irregularity of graph structures contributes to poor data locality which further exacerbates the problem. Consequently, existing frameworks capable of efficiently training large GNN models usually incur a significant accuracy degradation because of the currently-available shortcuts involved. To address these limitations, we instead propose FreshGNN, a general-purpose GNN mini-batch training framework that leverages a historical cache for storing and reusing GNN node embeddings instead of re-computing them through fetching raw features at every iteration. Critical to its success, the corresponding cache policy is designed, using a combination of gradient-based and staleness criteria, to selectively screen those embeddings which are relatively stable and can be cached, from those that need to be re-computed to reduce estimation errors and subsequent downstream accuracy loss. When paired with complementary system enhancements to support this selective historical cache, FreshGNN is able to accelerate the training speed on large graph datasets such as ogbn-papers100M and MAG240M by 3.4x up to 20.5x and reduce the memory access by 59%, with less than 1% influence on test accuracy.
Paper Structure (25 sections, 2 equations, 19 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 2 equations, 19 figures, 3 tables, 1 algorithm.

Figures (19)

  • Figure 1: Average estimation error in one training epoch for GAS gas on ogbn-products.
  • Figure 2: Test accuracy of mini-batch training algorithms (ClusterGCN clustergcn, GNS gns, LADIES ladies, GraphFM graphfm, GAS gas, and MariusGNN mariusgnn) and full graph training with historical embedding (SANCUS sancus) compared with the target accuracy achieved by (expensive) neighbor sampling on: (a) relatively small ogbn-products graph where the gap is modest for most algorithms, and (b) larger ogbn-papers100M graph where the gap grows significantly.
  • Figure 3: Distribution of cosine similarity between embeddings at iteration $t$ and embeddings at iteration $t-s$ during the training of a GCN model on ogbn-arxiv. Here $s=20$.
  • Figure 4: Illustration of historical embedding cache using an example mini-batch graph.
  • Figure 5: FreshGNN system workflow.
  • ...and 14 more figures