Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
Tong Qiao, Ao Zhou, Yingjie Qi, Yiou Wang, Han Wan, Jianlei Yang, Chunming Hu
TL;DR
This paper tackles the high cost and inflexibility of GNN training on CPU-GPU heterogenous platforms by introducing A3GNN, which fuses locality-aware sampling, adaptive multi-level parallelism scheduling, and task-hardware oriented auto-tuning driven by reinforcement learning. A surrogate performance model is learned offline to predict throughput, memory footprint, and accuracy, enabling rapid design-space exploration under hardware constraints. The approach yields significant speedups (up to 3.95X in some settings) and scalable performance across diverse datasets, while controlling memory usage and accuracy loss. The work demonstrates how hardware-aware, automated optimization can make large-scale GNN training more accessible on commodity hardware with broad practical impact for researchers and practitioners.
Abstract
Graph Neural Networks (GNNs) have been widely adopted due to their strong performance. However, GNN training often relies on expensive, high-performance computing platforms, limiting accessibility for many tasks. Profiling of representative GNN workloads indicates that substantial efficiency gains are possible on resource-constrained devices by fully exploiting available resources. This paper introduces A3GNN, a framework for affordable, adaptive, and automatic GNN training on heterogeneous CPU-GPU platforms. It improves resource usage through locality-aware sampling and fine-grained parallelism scheduling. Moreover, it leverages reinforcement learning to explore the design space and achieve pareto-optimal trade-offs among throughput, memory footprint, and accuracy. Experiments show that A3GNN can bridge the performance gap, allowing seven Nvidia 2080Ti GPUs to outperform two A100 GPUs by up to 1.8X in throughput with minimal accuracy loss.
