Table of Contents
Fetching ...

EmbedPart: Embedding-Driven Graph Partitioning for Scalable Graph Neural Network Training

Nikolai Merkel, Ruben Mayer, Volker Markl, Hans-Arno Jacobsen

Abstract

Graph Neural Networks (GNNs) are widely used for learning on graph-structured data, but scaling GNN training to massive graphs remains challenging. To enable scalable distributed training, graphs are divided into smaller partitions that are distributed across multiple machines such that inter-machine communication is minimized and computational load is balanced. In practice, existing partitioning approaches face a fundamental trade-off between partitioning overhead and partitioning quality. We propose EmbedPart, an embedding-driven partitioning approach that achieves both speed and quality. Instead of operating directly on irregular graph structures, EmbedPart leverages node embeddings produced during the actual GNN training workload and clusters these dense embeddings to derive a partitioning. EmbedPart achieves more than 100x speedup over Metis while maintaining competitive partitioning quality and accelerating distributed GNN training. Moreover, EmbedPart naturally supports graph updates and fast repartitioning, and can be applied to graph reordering to improve data locality and accelerate single-machine GNN training. By shifting partitioning from irregular graph structures to dense embeddings, EmbedPart enables scalable and high-quality graph data optimization.

EmbedPart: Embedding-Driven Graph Partitioning for Scalable Graph Neural Network Training

Abstract

Graph Neural Networks (GNNs) are widely used for learning on graph-structured data, but scaling GNN training to massive graphs remains challenging. To enable scalable distributed training, graphs are divided into smaller partitions that are distributed across multiple machines such that inter-machine communication is minimized and computational load is balanced. In practice, existing partitioning approaches face a fundamental trade-off between partitioning overhead and partitioning quality. We propose EmbedPart, an embedding-driven partitioning approach that achieves both speed and quality. Instead of operating directly on irregular graph structures, EmbedPart leverages node embeddings produced during the actual GNN training workload and clusters these dense embeddings to derive a partitioning. EmbedPart achieves more than 100x speedup over Metis while maintaining competitive partitioning quality and accelerating distributed GNN training. Moreover, EmbedPart naturally supports graph updates and fast repartitioning, and can be applied to graph reordering to improve data locality and accelerate single-machine GNN training. By shifting partitioning from irregular graph structures to dense embeddings, EmbedPart enables scalable and high-quality graph data optimization.

Paper Structure

This paper contains 22 sections, 7 equations, 10 figures, 6 tables, 4 algorithms.

Figures (10)

  • Figure 1: (a) Traditional approaches perform graph partitioning or reordering as a preprocessing step based solely on the graph structure. (b) Our approach (EmbedPart) derives partitions from node embeddings produced during GNN training. These embeddings capture both structural and feature information and enable partitioning or reordering directly in the dense embedding space using scalable clustering.
  • Figure 2: Vertices are assigned to partitions. Edges connecting vertices of different partitions are cut.
  • Figure 3: Distributed GNN training pipeline: DAE8FC#[] the input graph is partitioned into $k$ partitions; DAE8FC#[] hyperparameters for the GNN are selected; DAE8FC#[] the model is trained in a distributed fashion; DAE8FC#[] the model is evaluated; repeat DAE8FC#[] - DAE8FC#[] with the next hyperparameter set; DAE8FC#[] once the final model is trained, it is applied to downstream tasks.
  • Figure 4: EmbedPart overview: The input to EmbedPart is like in GNN training, a graph with features that are attached to nodes. In Phase 1, we train a GNN model (the actual workload) and get node embeddings. At any point, the process can transition to Phase 2, where clustering on the embeddings assigns nodes to clusters. To maintain balanced distributed training, nodes are migrated from overloaded to underloaded clusters such as node 4. The resulting clusters are then used for graph partitioning. Finally, the process returns to Phase 1 to continue GNN training with the improved data layout.
  • Figure 5: Loss and accuracy for reddit over epochs.
  • ...and 5 more figures