Table of Contents
Fetching ...

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

Daniel T. Speckhard, Tim Bechtel, Sebastian Kehl, Jonathan Godwin, Claudia Draxl

TL;DR

The paper investigates static and dynamic batching for training GNNs on QM9 and AFLOW, revealing that batching choice can yield up to $2.7\times$ speedups in training time (up to $12.5\times$ when including the slowest static-constant case) depending on data, model, and hardware. By formalizing batching variants and conducting extensive timing across multiple models (SchNet, MPEU, PaiNN) and datasets, it shows dynamic batching often matches or surpasses static-64 in practical training time while avoiding recompilations. The study also documents that, for some configurations, dynamic batching can produce significantly different test metrics, suggesting a nuanced trade-off between speed and learning behavior. Overall, the work provides practical guidance for reducing GNN training costs, especially in neural architecture searches and large-scale experiments, by favoring dynamic or static-64 batching over the static-constant approach. Future work includes predicting the fastest algorithm from dataset and model characteristics and enabling efficient multi-GPU batching.

Abstract

Graph neural networks (GNN) have shown promising results for several domains such as materials science, chemistry, and the social sciences. GNN models often contain millions of parameters, and like other neural network (NN) models, are often fed only a fraction of the graphs that make up the training dataset in batches to update model parameters. The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs. We analyze two different batching algorithms for graph based models, namely static and dynamic batching for two datasets, the QM9 dataset of small molecules and the AFLOW materials database. Our experiments show that changing the batching algorithm can provide up to a 2.7x speedup, but the fastest algorithm depends on the data, model, batch size, hardware, and number of training steps run. Experiments show that for a select number of combinations of batch size, dataset, and model, significant differences in model learning metrics are observed between static and dynamic batching algorithms.

Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms

TL;DR

The paper investigates static and dynamic batching for training GNNs on QM9 and AFLOW, revealing that batching choice can yield up to speedups in training time (up to when including the slowest static-constant case) depending on data, model, and hardware. By formalizing batching variants and conducting extensive timing across multiple models (SchNet, MPEU, PaiNN) and datasets, it shows dynamic batching often matches or surpasses static-64 in practical training time while avoiding recompilations. The study also documents that, for some configurations, dynamic batching can produce significantly different test metrics, suggesting a nuanced trade-off between speed and learning behavior. Overall, the work provides practical guidance for reducing GNN training costs, especially in neural architecture searches and large-scale experiments, by favoring dynamic or static-64 batching over the static-constant approach. Future work includes predicting the fastest algorithm from dataset and model characteristics and enabling efficient multi-GPU batching.

Abstract

Graph neural networks (GNN) have shown promising results for several domains such as materials science, chemistry, and the social sciences. GNN models often contain millions of parameters, and like other neural network (NN) models, are often fed only a fraction of the graphs that make up the training dataset in batches to update model parameters. The effect of batching algorithms on training time and model performance has been thoroughly explored for NNs but not yet for GNNs. We analyze two different batching algorithms for graph based models, namely static and dynamic batching for two datasets, the QM9 dataset of small molecules and the AFLOW materials database. Our experiments show that changing the batching algorithm can provide up to a 2.7x speedup, but the fastest algorithm depends on the data, model, batch size, hardware, and number of training steps run. Experiments show that for a select number of combinations of batch size, dataset, and model, significant differences in model learning metrics are observed between static and dynamic batching algorithms.

Paper Structure

This paper contains 18 sections, 7 equations, 14 figures, 1 table, 5 algorithms.

Figures (14)

  • Figure 1: Left: Histograms of the number of nodes (top) and edges (bottom) in the AFLOW and QM9 datasets. Right: Histograms of the number of nodes (top) and edges (bottom) in a batch of size 32 before padding for the static-$2^N$ and dynamic batching algorithms running on the QM9 dataset.
  • Figure 2: The running average of the combined time (sum of the batching step and gradient-update step and) required per training step as a function of the total number of training steps run. Here only a single iteration is run for the batch size 32, MPEU model and QM9 dataset for both the dynamic, static-64 and static-$2^N$ algorithms.
  • Figure 3: Left: number of recompilations on the QM9 dataset after two million training steps in the gradient-update step as a function of batch size for different batching algorithms. Note, that these numbers are model and hardware independent. Right: histogram of the number of graphs before padding in a batch size of 32 for the AFLOW and QM9 dataset.
  • Figure 4: Timing measurements while varying the batch size for SchNet (left two columns), MPEU (middle columns) and PaiNN (right two columns) Results for the AFLOW and QM9 dataset are shown for each model. For each datapoint, ten iterations of two million training steps are run.
  • Figure 5: Speedup, on GPU, when switching from the slowest algorithm in terms of combined training time per step for the PaiNN model (not including the static-constant model) for the AFLOW (top) and QM9 (bottom) datasets. For both datasets the slowest algorithm is the static-$2^N$ algorithm. If the static-constant algorithm is included the speedup increases to a maximum of to 12.5
  • ...and 9 more figures