Table of Contents
Fetching ...

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

Kevin Han, Bowen Deng, Amir Barati Farimani, Gerbrand Ceder

TL;DR

DistMLIP tackles the challenge of scaling quantum-chemical simulations by enabling multi-GPU inference for long-range interatomic potentials. It achieves this with graph-level partitioning that distributes both atom graphs and three-body bond graphs across GPUs while maintaining zero redundancy, making the approach architecture-agnostic and plug-in ready. The platform is demonstrated on CHGNet, MACE, TensorNet, and eSEN, delivering up to 8× speedups and up to 3.4× capacity improvements, enabling near-million-atom simulations on modest hardware and interval nanosecond timescales. This work could significantly accelerate materials, chemistry, and biophysics research by enabling large-scale, accurate atomistic simulations previously constrained by hardware and parallelization limitations.

Abstract

Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP presents an easy-to-use, flexible, plug-in interface that enables distributed inference of pre-existing MLIPs. We demonstrate DistMLIP on four widely used and state-of-the-art MLIPs: CHGNet, MACE, TensorNet, and eSEN. We show that DistMLIP can simulate atomic systems 3.4x larger and up to 8x faster compared to previous multi-GPU methods. We show that existing foundation potentials can perform near-million-atom calculations at the scale of a few seconds on 8 GPUs with DistMLIP.

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

TL;DR

DistMLIP tackles the challenge of scaling quantum-chemical simulations by enabling multi-GPU inference for long-range interatomic potentials. It achieves this with graph-level partitioning that distributes both atom graphs and three-body bond graphs across GPUs while maintaining zero redundancy, making the approach architecture-agnostic and plug-in ready. The platform is demonstrated on CHGNet, MACE, TensorNet, and eSEN, delivering up to 8× speedups and up to 3.4× capacity improvements, enabling near-million-atom simulations on modest hardware and interval nanosecond timescales. This work could significantly accelerate materials, chemistry, and biophysics research by enabling large-scale, accurate atomistic simulations previously constrained by hardware and parallelization limitations.

Abstract

Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP presents an easy-to-use, flexible, plug-in interface that enables distributed inference of pre-existing MLIPs. We demonstrate DistMLIP on four widely used and state-of-the-art MLIPs: CHGNet, MACE, TensorNet, and eSEN. We show that DistMLIP can simulate atomic systems 3.4x larger and up to 8x faster compared to previous multi-GPU methods. We show that existing foundation potentials can perform near-million-atom calculations at the scale of a few seconds on 8 GPUs with DistMLIP.

Paper Structure

This paper contains 31 sections, 3 equations, 10 figures, 3 tables, 3 algorithms.

Figures (10)

  • Figure 1: An overview of DistMLIP. (a) DistMLIP takes public MLIP models and performs large-scale, distributed simulations. (b) Partition the atom graph using a vertical spatial partitioning scheme, and construct subgraphs containing the 1-hop neighbors and 2-hop neighbors of the original partition, which are later used to calculate the distributed bond graphs. (c) Take the 2-hop atom graph and create an edge table backbone mapping node IDs (black) to edge IDs (orange) that contain the node ID as a source node. (d) Recursively traverse the edge table to construct the atom graph and bond graph. (e) Data transfer in a simple 2-layer graph neural network with both atom graph and bond graph.
  • Figure 2: Performance scaling of DistMLIP inference with 4 pretrained MLIPs: MACE-3.8M, TensorNet-0.8M, CHGNet-2.7M, and eSEN-3.2M. All results are averaged over 10 inferences on a SiO_2 supercell. (a) Maximum capacity (number of simulatable atoms) vs. the number of GPUs. Values are normalized by the 1-GPU capacity. (b) Strong scaling of MLIP inference on DistMLIP, where the total number of atoms in the supercell is held constant while the number of GPUs increases. (c) Weak scaling behavior of MLIP inference on DistMLIP, where the number of atoms on each GPU device is held constant while the number of GPUs increases.
  • Figure 3: Effect of model configurations on graph-parallelized inference performance. (a) Inference time vs. MLIP interaction range while keeping model parameter size fixed. Values are represented as multiples of the 10Å interaction range. (b) Inference time and (c) maximum simulation capacity vs. number of parameters in the MLIP, while keeping interaction range fixed.
  • Figure 4: Sample simulation cells from real-world systems that are benchmarked in Table \ref{['tab:real_world']}. (a)Li_3PO_4 supercell of 216.0k atoms. (b)H_2O supercell of 210.1k atoms. (c)GaN supercell of 250.0k atoms. (d)Cd_2B_2H_48C_55N_6(O_2F)_4 metal organic framework (MOF) system of 216.0k atoms. (e) 2w49, an insect flight muscle protein of 69.3k atoms.
  • Figure 5: Timing breakdown, by percentage, for CHGNet-2.7M, MACE-3.8M, TensorNet-0.8M and eSEN-3.2M models across data transfer, backward pass (for force calculation), forward pass, and graph construction. The total number of atoms is held fixed across all GPUs runs.
  • ...and 5 more figures