Heta: Distributed Training of Heterogeneous Graph Neural Networks
Yuchen Zhong, Junwei Su, Chuan Wu, Minjie Wang
TL;DR
Heta addresses the distributed training bottlenecks of heterogeneous-graph neural networks by introducing a Relation-Aggregation-First (RAF) computation paradigm and a metagraph-based meta-partitioning strategy that confines cross-partition communication to partial relation aggregations. A miss-penalty-aware GPU feature cache further reduces data movement by prioritizing node types with higher cache penalties, including learnable features and optimizer states. The authors provide theoretical guarantees on reduced communication complexity, a scalable partitioning algorithm with $O(|A|\log|A|)+O(|R|)$ time, and extensive empirical evaluation showing up to $5.8\times$ epoch-time speedups with no loss in accuracy across multiple HGNN models and large HetGs. Collectively, Heta enables scalable, efficient distributed HGNN training for real-world heterogeneous graphs with varied feature dimensions and incomplete node features.
Abstract
Heterogeneous Graph Neural Networks (HGNNs) leverage diverse semantic relationships in Heterogeneous Graphs (HetGs) and have demonstrated remarkable learning performance in various applications. However, current distributed GNN training systems often overlook unique characteristics of HetGs, such as varying feature dimensions and the prevalence of missing features among nodes, leading to suboptimal performance or even incompatibility with distributed HGNN training. We introduce Heta, a framework designed to address the communication bottleneck in distributed HGNN training. Heta leverages the inherent structure of HGNNs - independent relation-specific aggregations for each relation, followed by a cross-relation aggregation - and advocates for a novel Relation-Aggregation-First computation paradigm. It performs relation-specific aggregations within graph partitions and then exchanges partial aggregations. This design, coupled with a new graph partitioning method that divides a HetG based on its graph schema and HGNN computation dependency, substantially reduces communication overhead. Heta further incorporates an innovative GPU feature caching strategy that accounts for the different cache miss-penalties associated with diverse node types. Comprehensive evaluations of various HGNN models and large heterogeneous graph datasets demonstrate that Heta outperforms state-of-the-art systems like DGL and GraphLearn by up to 5.8x and 2.3x in end-to-end epoch time, respectively.
