Deal: Distributed End-to-End GNN Inference for All Nodes
Shiyang Chen, Xiang Song, Vasiloudis Theodore, Hang Liu
TL;DR
Deal tackles end-to-end all-node GNN inference on billion-edge graphs by combining (i) a sampling strategy that collapses ego networks into $k$ aligned $1$-hop graphs to maximize sharing, (ii) a lightweight 1-D topology/feature co-design partition with memory-efficient distributed primitives, and (iii) partitioned, pipelined communication with fusion of feature preparation into the first GNN primitive. The approach yields up to $7.70x$ faster end-to-end inference and up to $21.05x$ faster graph construction over state-of-the-art on real datasets, and scales effectively across up to 16 machines. By reusing shared computations across all nodes and tightly coupling graph construction with computation, Deal enables practical, scalable end-to-end GNN inference for very large graphs.
Abstract
Graph Neural Networks (GNNs) are a new research frontier with various applications and successes. The end-to-end inference for all nodes, is common for GNN embedding models, which are widely adopted in applications like recommendation and advertising. While sharing opportunities arise in GNN tasks (i.e., inference for a few nodes and training), the potential for sharing in full graph end-to-end inference is largely underutilized because traditional efforts fail to fully extract sharing benefits due to overwhelming overheads or excessive memory usage. This paper introduces Deal, a distributed GNN inference system that is dedicated to end-to-end inference for all nodes for graphs with multi-billion edges. First, we unveil and exploit an untapped sharing opportunity during sampling, and maximize the benefits from sharing during subsequent GNN computation. Second, we introduce memory-saving and communication-efficient distributed primitives for lightweight 1-D graph and feature tensor collaborative partitioning-based distributed inference. Third, we introduce partitioned, pipelined communication and fusing feature preparation with the first GNN primitive for end-to-end inference. With Deal, the end-to-end inference time on real-world benchmark datasets is reduced up to 7.70 x and the graph construction time is reduced up to 21.05 x, compared to the state-of-the-art.
