Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch
Saurabh Bajaj, Hojae Son, Juelin Liu, Hui Guan, Marco Serafini
TL;DR
The paper delivers a comprehensive, cross-class empirical comparison between full-graph and mini-batch GNN training systems. It demonstrates that mini-batch methods consistently reach target accuracy faster (lower time-to-accuracy) and often match or exceed full-graph accuracy when hyperparameters are tuned separately for each method. By analyzing both performance and cost, the work argues for evaluating across training paradigms using time-to-accuracy rather than epoch time and highlights practical optimization strategies like GPU caching and sampling choices. The findings suggest mini-batch training is generally more efficient in practice, while well-chosen full-graph optimizations can reduce overhead without sacrificing accuracy, guiding future system design and benchmarking in GNN training.
Abstract
Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph-structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate classes of GNN training systems emerged, each tailored for one method. Works that introduce systems belonging to a particular category predominantly compare them with other systems within the same category, offering limited or no comparison with systems from the other category. Some prior work also justifies its focus on one specific training method by arguing that it achieves higher accuracy than the alternative. The literature, however, has incomplete and contradictory evidence in this regard. In this paper, we provide a comprehensive empirical comparison of representative full-graph and mini-batch GNN training systems. We find that the mini-batch training systems consistently converge faster than the full-graph training ones across multiple datasets, GNN models, and system configurations. We also find that mini-batch training techniques converge to similar to or often higher accuracy values than full-graph training ones, showing that mini-batch sampling is not necessarily detrimental to accuracy. Our work highlights the importance of comparing systems across different classes, using time-to-accuracy rather than epoch time for performance comparison, and selecting appropriate hyperparameters for each training method separately.
