Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

Saurabh Bajaj; Hojae Son; Juelin Liu; Hui Guan; Marco Serafini

Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

Saurabh Bajaj, Hojae Son, Juelin Liu, Hui Guan, Marco Serafini

TL;DR

The paper delivers a comprehensive, cross-class empirical comparison between full-graph and mini-batch GNN training systems. It demonstrates that mini-batch methods consistently reach target accuracy faster (lower time-to-accuracy) and often match or exceed full-graph accuracy when hyperparameters are tuned separately for each method. By analyzing both performance and cost, the work argues for evaluating across training paradigms using time-to-accuracy rather than epoch time and highlights practical optimization strategies like GPU caching and sampling choices. The findings suggest mini-batch training is generally more efficient in practice, while well-chosen full-graph optimizations can reduce overhead without sacrificing accuracy, guiding future system design and benchmarking in GNN training.

Abstract

Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph-structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate classes of GNN training systems emerged, each tailored for one method. Works that introduce systems belonging to a particular category predominantly compare them with other systems within the same category, offering limited or no comparison with systems from the other category. Some prior work also justifies its focus on one specific training method by arguing that it achieves higher accuracy than the alternative. The literature, however, has incomplete and contradictory evidence in this regard. In this paper, we provide a comprehensive empirical comparison of representative full-graph and mini-batch GNN training systems. We find that the mini-batch training systems consistently converge faster than the full-graph training ones across multiple datasets, GNN models, and system configurations. We also find that mini-batch training techniques converge to similar to or often higher accuracy values than full-graph training ones, showing that mini-batch sampling is not necessarily detrimental to accuracy. Our work highlights the importance of comparing systems across different classes, using time-to-accuracy rather than epoch time for performance comparison, and selecting appropriate hyperparameters for each training method separately.

Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 14 figures, 25 tables)

This paper contains 13 sections, 5 equations, 14 figures, 25 tables.

Introduction
Background and Motivation
Graph Neural Networks
Full-Graph vs. Mini-Batch Training: Different Systems for Different Pipelines
Motivation
Experimental methodology
Performance Evaluation
Time-to-Accuracy
Scalability
Accuracy Evaluation
Cost Analysis
Conclusion
Ease of Hyperparameter Search

Figures (14)

Figure 1: Different data management pipelines in two example systems: PipeGCN (full-graph) and DGL (mini-batch). The diagrams omit gradient synchronization.
Figure 2: Classes of optimizations that impact performance (P) and accuracy (A), and representative systems evaluated in this work.
Figure 3: Convergence curve for GraphSAGE (NVLink host).
Figure 4: Convergence curve for GraphSage (3 NVLink hosts).
Figure 5: Test accuracy reported in literature on GNN training wan_pipegcndgl_2023_dmlcdglhu2020ogbsong_2021_networkmd2021distgnnyang_2023_bettyli2024rethinkingpmlr-v198-gasteiger22a_influence_basedyao2021blockingzou_2019_layerdependenthamilton2017inductivechiang_2019_clustergcngraphsaint-ipdps19chen2018stochastic_vrgcnyao2021blockingwu2024sgformervelikovi_graphthorpe_dorylusfrasca2020signwan2022bnsgcnkipf_semisupervisedjia_improving_roc .
...and 9 more figures

Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

TL;DR

Abstract

Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

Authors

TL;DR

Abstract

Table of Contents

Figures (14)