GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

Zhongshu Zhu; Bin Jing; Xiaopei Wan; Zhizhen Liu; Lei Liang; Jun zhou

GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

Zhongshu Zhu, Bin Jing, Xiaopei Wan, Zhizhen Liu, Lei Liang, Jun zhou

TL;DR

GLISP tackles the scalability challenge of deploying Graph Neural Networks on industrial-scale graphs with power-law degree distributions. It introduces AdaDNE, a vertex-cut partitioning algorithm tailored for load-balanced partitions, a memory-efficient graph data structure, a Gather-Apply sampling service, and a layerwise inference engine with a two-level embedding cache and PDS-based graph reordering. Key contributions include AdaDNE, the compact partitioned-data layout, online sampling with cooperative one-hop sampling, and a two-level caching scheme that yields substantial speedups—up to $6.53\times$ in training and $70.77\times$ in inference on graphs with billions of vertices and edges. The approach enables scalable GNN training and inference under limited resources, making industrial-scale graph learning more practical.

Abstract

As a powerful tool for modeling graph data, Graph Neural Networks (GNNs) have received increasing attention in both academia and industry. Nevertheless, it is notoriously difficult to deploy GNNs on industrial scale graphs, due to their huge data size and complex topological structures. In this paper, we propose GLISP, a sampling based GNN learning system for industrial scale graphs. By exploiting the inherent structural properties of graphs, such as power law distribution and data locality, GLISP addresses the scalability and performance issues that arise at different stages of the graph learning process. GLISP consists of three core components: graph partitioner, graph sampling service and graph inference engine. The graph partitioner adopts the proposed vertex-cut graph partitioning algorithm AdaDNE to produce balanced partitioning for power law graphs, which is essential for sampling based GNN systems. The graph sampling service employs a load balancing design that allows the one hop sampling request of high degree vertices to be handled by multiple servers. In conjunction with the memory efficient data structure, the efficiency and scalability are effectively improved. The graph inference engine splits the $K$-layer GNN into $K$ slices and caches the vertex embeddings produced by each slice in the data locality aware hybrid caching system for reuse, thus completely eliminating redundant computation caused by the data dependency of graph. Extensive experiments show that GLISP achieves up to $6.53\times$ and $70.77\times$ speedups over existing GNN systems for training and inference tasks, respectively, and can scale to the graph with over 10 billion vertices and 40 billion edges with limited resources.

GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

TL;DR

in training and

in inference on graphs with billions of vertices and edges. The approach enables scalable GNN training and inference under limited resources, making industrial-scale graph learning more practical.

Abstract

-layer GNN into

slices and caches the vertex embeddings produced by each slice in the data locality aware hybrid caching system for reuse, thus completely eliminating redundant computation caused by the data dependency of graph. Extensive experiments show that GLISP achieves up to

and

speedups over existing GNN systems for training and inference tasks, respectively, and can scale to the graph with over 10 billion vertices and 40 billion edges with limited resources.

Paper Structure (20 sections, 7 equations, 15 figures, 5 tables, 4 algorithms)

This paper contains 20 sections, 7 equations, 15 figures, 5 tables, 4 algorithms.

Introduction
Preliminaries
Sampling based GNN Training
Graph Partition
Graph Reorder
System Design
Architecture
Graph Partitioner
Graph Sampling Service
Graph Inference Engine
Evaluation
Datasets
Graph Partitioner
Neighbor Sampling Performance
Model convergence and scalability
...and 5 more sections

Figures (15)

Figure 1: Schematic diagram of the distributed graph learning framework. A complete workflow consists of four steps: graph partitioning, launching graph sampling service, training and inference.
Figure 2: Schematic diagram of $K$ hop neighbor sampling, where $K=2$ and 2 seed vertices are selected.
Figure 3: Schematic diagram of vertex-cut and edge-cut partition.
Figure 4: The system architecture of GLISP.
Figure 5: The Gather-Apply based $K$ hop neighbor sampling algorithm. The key difference from existing frameworks is the one hop sampling requests for the boundary vertices are handled cooperatively by all servers on which it resides to substentially balance the workload, as shown in the second seed vertex (tri-colored marker).
...and 10 more figures

GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

TL;DR

Abstract

GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (15)