Heta: Distributed Training of Heterogeneous Graph Neural Networks

Yuchen Zhong; Junwei Su; Chuan Wu; Minjie Wang

Heta: Distributed Training of Heterogeneous Graph Neural Networks

Yuchen Zhong, Junwei Su, Chuan Wu, Minjie Wang

TL;DR

Heta addresses the distributed training bottlenecks of heterogeneous-graph neural networks by introducing a Relation-Aggregation-First (RAF) computation paradigm and a metagraph-based meta-partitioning strategy that confines cross-partition communication to partial relation aggregations. A miss-penalty-aware GPU feature cache further reduces data movement by prioritizing node types with higher cache penalties, including learnable features and optimizer states. The authors provide theoretical guarantees on reduced communication complexity, a scalable partitioning algorithm with $O(|A|\log|A|)+O(|R|)$ time, and extensive empirical evaluation showing up to $5.8\times$ epoch-time speedups with no loss in accuracy across multiple HGNN models and large HetGs. Collectively, Heta enables scalable, efficient distributed HGNN training for real-world heterogeneous graphs with varied feature dimensions and incomplete node features.

Abstract

Heterogeneous Graph Neural Networks (HGNNs) leverage diverse semantic relationships in Heterogeneous Graphs (HetGs) and have demonstrated remarkable learning performance in various applications. However, current distributed GNN training systems often overlook unique characteristics of HetGs, such as varying feature dimensions and the prevalence of missing features among nodes, leading to suboptimal performance or even incompatibility with distributed HGNN training. We introduce Heta, a framework designed to address the communication bottleneck in distributed HGNN training. Heta leverages the inherent structure of HGNNs - independent relation-specific aggregations for each relation, followed by a cross-relation aggregation - and advocates for a novel Relation-Aggregation-First computation paradigm. It performs relation-specific aggregations within graph partitions and then exchanges partial aggregations. This design, coupled with a new graph partitioning method that divides a HetG based on its graph schema and HGNN computation dependency, substantially reduces communication overhead. Heta further incorporates an innovative GPU feature caching strategy that accounts for the different cache miss-penalties associated with diverse node types. Comprehensive evaluations of various HGNN models and large heterogeneous graph datasets demonstrate that Heta outperforms state-of-the-art systems like DGL and GraphLearn by up to 5.8x and 2.3x in end-to-end epoch time, respectively.

Heta: Distributed Training of Heterogeneous Graph Neural Networks

TL;DR

time, and extensive empirical evaluation showing up to

epoch-time speedups with no loss in accuracy across multiple HGNN models and large HetGs. Collectively, Heta enables scalable, efficient distributed HGNN training for real-world heterogeneous graphs with varied feature dimensions and incomplete node features.

Abstract

Paper Structure (20 sections, 3 theorems, 3 equations, 16 figures, 2 tables, 2 algorithms)

This paper contains 20 sections, 3 theorems, 3 equations, 16 figures, 2 tables, 2 algorithms.

Introduction
Background and Motivation
Heterogeneous Graph Neural Networks
Distributed Training of HGNNs
Opportunities & Challenges
System Overview
RAF HGNN Computation
Meta-Partitioning of HetG
GPU Feature Cache
Implementation
Evaluation
Methodology
Overall Performance
Training Time Breakdown
Meta-Partitioning Efficiency
...and 5 more sections

Key Result

Proposition 1

Let $\mathbf{h}_v^{(\mathrm{vanilla})}$ and $\mathbf{h}_v^{(\mathrm{RAF})}$ be the embedding of a target node $v$ obtained with the vanilla execution model and the RAF paradigm, respectively. It holds that $\mathbf{h}_v^{(\mathrm{vanilla})} = \mathbf{h}_v^{(\mathrm{RAF})}$.

Figures (16)

Figure 1: Illustration of R-GAT model architecture.
Figure 2: Metagraph and decomposed mono-relation subgraphs of the ogbn-mag dataset hu2020open.
Figure 3: Vanilla execution model of existing distributed GNN training systems that support HGNN training.
Figure 4: Percentage of the epoch time spent on each stage: training R-GCN on three datasets with DGL.
Figure 5: Heta's design overview.
...and 11 more figures

Theorems & Definitions (3)

Proposition 1: Mathematical Equivalence
Proposition 2: Communication Complexity
Proposition 3: Communication Reduction

Heta: Distributed Training of Heterogeneous Graph Neural Networks

TL;DR

Abstract

Heta: Distributed Training of Heterogeneous Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (3)