An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training
Nikolai Merkel, Daniel Stoll, Ruben Mayer, Hans-Arno Jacobsen
TL;DR
The paper addresses the challenge of efficiently training graph neural networks on large graphs by examining graph partitioning as a core optimization for distributed GNN training. It compares edge-partitioned (DistGNN) and vertex-partitioned (DistDGL) systems across 12 partitioners, multiple GNN models, and graph types, showing that high-quality partitioning can yield substantial speedups (up to about 10x) and major memory savings (up to ~85%), with partitioning time often amortized by faster training. Key insights reveal that replication factor and vertex balance critically shape communication and memory overhead, and that GNN hyper-parameters and scale-out influence partitioning effectiveness in distinct ways for edge vs. vertex partitioning. The findings advocate integrating partitioning-aware strategies into production GNN pipelines and encourage further research into partitioners tailored to the peculiarities of GNN workloads.
Abstract
Recently, graph neural networks (GNNs) have gained much attention as a growing area of deep learning capable of learning on graph-structured data. However, the computational and memory requirements for training GNNs on large-scale graphs make it necessary to distribute the training. A prerequisite for distributed GNN training is to partition the input graph into smaller parts that are distributed among multiple machines of a compute cluster. Although graph partitioning has been studied with regard to graph analytics and graph databases, its effect on GNN training performance is largely unexplored. As a consequence, it is unclear whether investing computational efforts into high-quality graph partitioning would pay off in GNN training scenarios. In this paper, we study the effectiveness of graph partitioning for distributed GNN training. Our study aims to understand how different factors such as GNN parameters, mini-batch size, graph type, features size, and scale-out factor influence the effectiveness of graph partitioning. We conduct experiments with two different GNN systems using vertex and edge partitioning. We found that high-quality graph partitioning is a very effective optimization to speed up GNN training and to reduce memory consumption. Furthermore, our results show that invested partitioning time can quickly be amortized by reduced GNN training time, making it a relevant optimization for most GNN scenarios. Compared to research on distributed graph processing, our study reveals that graph partitioning plays an even more significant role in distributed GNN training, which motivates further research on the graph partitioning problem.
