Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

Daniel T. Speckhard; Tim Bechtel; Sebastian Kehl; Jonathan Godwin; Claudia Draxl

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

Daniel T. Speckhard, Tim Bechtel, Sebastian Kehl, Jonathan Godwin, Claudia Draxl

TL;DR

The paper investigates static and dynamic batching for training GNNs on QM9 and AFLOW, revealing that batching choice can yield up to $2.7\times$ speedups in training time (up to $12.5\times$ when including the slowest static-constant case) depending on data, model, and hardware. By formalizing batching variants and conducting extensive timing across multiple models (SchNet, MPEU, PaiNN) and datasets, it shows dynamic batching often matches or surpasses static-64 in practical training time while avoiding recompilations. The study also documents that, for some configurations, dynamic batching can produce significantly different test metrics, suggesting a nuanced trade-off between speed and learning behavior. Overall, the work provides practical guidance for reducing GNN training costs, especially in neural architecture searches and large-scale experiments, by favoring dynamic or static-64 batching over the static-constant approach. Future work includes predicting the fastest algorithm from dataset and model characteristics and enabling efficient multi-GPU batching.

Abstract

Graph neural networks (GNN) have shown promising results for several domains such as materials science, chemistry, and the social sciences. GNN models often contain millions of parameters, and like other neural network (NN) models, are often fed only a fraction of the graphs that make up the training dataset in batches to update model parameters. The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs. We analyze two different batching algorithms for graph based models, namely static and dynamic batching for two datasets, the QM9 dataset of small molecules and the AFLOW materials database. Our experiments show that changing the batching algorithm can provide up to a 2.7x speedup, but the fastest algorithm depends on the data, model, batch size, hardware, and number of training steps run. Experiments show that for a select number of combinations of batch size, dataset, and model, significant differences in model learning metrics are observed between static and dynamic batching algorithms.

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

TL;DR

The paper investigates static and dynamic batching for training GNNs on QM9 and AFLOW, revealing that batching choice can yield up to

speedups in training time (up to

when including the slowest static-constant case) depending on data, model, and hardware. By formalizing batching variants and conducting extensive timing across multiple models (SchNet, MPEU, PaiNN) and datasets, it shows dynamic batching often matches or surpasses static-64 in practical training time while avoiding recompilations. The study also documents that, for some configurations, dynamic batching can produce significantly different test metrics, suggesting a nuanced trade-off between speed and learning behavior. Overall, the work provides practical guidance for reducing GNN training costs, especially in neural architecture searches and large-scale experiments, by favoring dynamic or static-64 batching over the static-constant approach. Future work includes predicting the fastest algorithm from dataset and model characteristics and enabling efficient multi-GPU batching.

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

TL;DR

Abstract

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)