Table of Contents
Fetching ...

A Unified CPU-GPU Protocol for GNN Training

Yi-Chien Lin, Gangda Deng, Viktor Prasanna

TL;DR

This work tackles the inefficiency of conventional GPU-dominant GNN training on CPU-GPU platforms by introducing a Unified CPU-GPU protocol that runs multiple GNN training processes on both CPUs and GPUs within a unified shared memory space, coordinated by a host and dynamic load balancer. The approach preserves training semantics while significantly improving resource utilization and reducing CPU-GPU data transfers, achieving up to 1.41x speedups on certain setups. Key contributions include the GNN Process Manager, a Dynamic Load Balancer that estimates per-mini-batch workload and adapts during training, and GPU feature caching to decrease PCIe traffic. Empirical evaluations across two platforms demonstrate robustness to different samplers and models, with optimization ablations showing clear gains; the method is open-sourced for integration into PyG and DGL, making high-demand GPU environments more accessible for large-scale GNN training.

Abstract

Training a Graph Neural Network (GNN) model on large-scale graphs involves a high volume of data communication and computations. While state-of-the-art CPUs and GPUs feature high computing power, the Standard GNN training protocol adopted in existing GNN frameworks cannot efficiently utilize the platform resources. To this end, we propose a novel Unified CPU-GPU protocol that can improve the resource utilization of GNN training on a CPU-GPU platform. The Unified CPU-GPU protocol instantiates multiple GNN training processes in parallel on both the CPU and the GPU. By allocating training processes on the CPU to perform GNN training collaboratively with the GPU, the proposed protocol improves the platform resource utilization and reduces the CPU-GPU data transfer overhead. Since the performance of a CPU and a GPU varies, we develop a novel load balancer that balances the workload dynamically between CPUs and GPUs during runtime. We evaluate our protocol using two representative GNN sampling algorithms, with two widely-used GNN models, on three datasets. Compared with the standard training protocol adopted in the state-of-the-art GNN frameworks, our protocol effectively improves resource utilization and overall training time. On a platform where the GPU moderately outperforms the CPU, our protocol speeds up GNN training by up to 1.41x. On a platform where the GPU significantly outperforms the CPU, our protocol speeds up GNN training by up to 1.26x. Our protocol is open-sourced and can be seamlessly integrated into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol particularly benefits those with limited GPU access due to its high demand.

A Unified CPU-GPU Protocol for GNN Training

TL;DR

This work tackles the inefficiency of conventional GPU-dominant GNN training on CPU-GPU platforms by introducing a Unified CPU-GPU protocol that runs multiple GNN training processes on both CPUs and GPUs within a unified shared memory space, coordinated by a host and dynamic load balancer. The approach preserves training semantics while significantly improving resource utilization and reducing CPU-GPU data transfers, achieving up to 1.41x speedups on certain setups. Key contributions include the GNN Process Manager, a Dynamic Load Balancer that estimates per-mini-batch workload and adapts during training, and GPU feature caching to decrease PCIe traffic. Empirical evaluations across two platforms demonstrate robustness to different samplers and models, with optimization ablations showing clear gains; the method is open-sourced for integration into PyG and DGL, making high-demand GPU environments more accessible for large-scale GNN training.

Abstract

Training a Graph Neural Network (GNN) model on large-scale graphs involves a high volume of data communication and computations. While state-of-the-art CPUs and GPUs feature high computing power, the Standard GNN training protocol adopted in existing GNN frameworks cannot efficiently utilize the platform resources. To this end, we propose a novel Unified CPU-GPU protocol that can improve the resource utilization of GNN training on a CPU-GPU platform. The Unified CPU-GPU protocol instantiates multiple GNN training processes in parallel on both the CPU and the GPU. By allocating training processes on the CPU to perform GNN training collaboratively with the GPU, the proposed protocol improves the platform resource utilization and reduces the CPU-GPU data transfer overhead. Since the performance of a CPU and a GPU varies, we develop a novel load balancer that balances the workload dynamically between CPUs and GPUs during runtime. We evaluate our protocol using two representative GNN sampling algorithms, with two widely-used GNN models, on three datasets. Compared with the standard training protocol adopted in the state-of-the-art GNN frameworks, our protocol effectively improves resource utilization and overall training time. On a platform where the GPU moderately outperforms the CPU, our protocol speeds up GNN training by up to 1.41x. On a platform where the GPU significantly outperforms the CPU, our protocol speeds up GNN training by up to 1.26x. Our protocol is open-sourced and can be seamlessly integrated into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol particularly benefits those with limited GPU access due to its high demand.
Paper Structure (21 sections, 1 equation, 7 figures, 4 tables)

This paper contains 21 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Standard training protocol used in state-of-the-art GNN frameworks
  • Figure 2: Target CPU-GPU platform
  • Figure 3: Training time breakdown of existing GNN library for various sampling algorithms and GNN models
  • Figure 4: Unified CPU-GPU training protocol
  • Figure 5: System Overview
  • ...and 2 more figures