Characterizing and Understanding HGNN Training on GPUs

Dengke Han; Mingyu Yan; Xiaochun Ye; Dongrui Fan

Characterizing and Understanding HGNN Training on GPUs

Dengke Han, Mingyu Yan, Xiaochun Ye, Dongrui Fan

TL;DR

This study conducts a comprehensive quantification and in-depth analysis of two mainstream HGNN training scenarios, including single-GPU and multi-GPU distributed training, and reveals the performance bottlenecks and their underlying causes in different HGNN training scenarios.

Abstract

Owing to their remarkable representation capabilities for heterogeneous graph data, Heterogeneous Graph Neural Networks (HGNNs) have been widely adopted in many critical real-world domains such as recommendation systems and medical analysis. Prior to their practical application, identifying the optimal HGNN model parameters tailored to specific tasks through extensive training is a time-consuming and costly process. To enhance the efficiency of HGNN training, it is essential to characterize and analyze the execution semantics and patterns within the training process to identify performance bottlenecks. In this study, we conduct an in-depth quantification and analysis of two mainstream HGNN training scenarios, including single-GPU and multi-GPU distributed training. Based on the characterization results, we disclose the performance bottlenecks and their underlying causes in different HGNN training scenarios and provide optimization guidelines from both software and hardware perspectives.

Characterizing and Understanding HGNN Training on GPUs

TL;DR

Abstract

Paper Structure (50 sections, 14 figures, 5 tables)

This paper contains 50 sections, 14 figures, 5 tables.

Introduction
Background
Heterogeneous Graphs and Semantic Graphs
Heterogeneous Graph Neural Networks
HGNN Training
Full-batch and Mini-batch Training
Single-node and Distributed Training
Workload Distribution
Characterization Methodology
Experimental Setup
Platforms
HGNN Models
Benchmark Datasets
Evaluation Methods
Single-GPU Training
...and 35 more sections

Figures (14)

Figure 1: Illustration of HetGs and HGNNs.
Figure 2: Illustration of HGNN training: (a) SGB stage; (b) Mini-batch sampling process; (c) Training process on a single computing node; (d) Distributed training process.
Figure 3: Time breakdown of HGNN training by phase: (a) The whole training process; (b) Forward; (c) Backward.
Figure 4: Time breakdown of HGNN training by kernel: (a) Forward; (b) Backward ("NONE" indicates that there are no CUDA kernels invoked here).
Figure 5: The roofline model for kernels under single-precision floating-point operations.
...and 9 more figures

Characterizing and Understanding HGNN Training on GPUs

TL;DR

Abstract

Characterizing and Understanding HGNN Training on GPUs

Authors

TL;DR

Abstract

Table of Contents

Figures (14)