Table of Contents
Fetching ...

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, Ali Jannesari

TL;DR

This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.

Abstract

Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

TL;DR

This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.

Abstract

Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

Paper Structure

This paper contains 30 sections, 9 equations, 14 figures, 6 tables, 2 algorithms.

Figures (14)

  • Figure 1: Extending DistDGL with our continuous prefetch and eviction scheme.
  • Figure 2: DistDGL architecture showcasing multiple samplers and trainers across distinct partitions, each represented by a different color.
  • Figure 3: Workflow of proposed prefetch and eviction scheme.
  • Figure 4: Visual representation of $S_E$ and $S_A$ scoreboards showing how prefetched nodes are selected for eviction and new candidates are chosen for replacing them. Highlighted borders represent swapping.
  • Figure 5: Quadrants showing four different tradeoff scenarios using various combinations of decay factor ($\gamma$) and eviction interval ($\Delta$).
  • ...and 9 more figures