DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Kevin Han, Bowen Deng, Amir Barati Farimani, Gerbrand Ceder
TL;DR
DistMLIP tackles the challenge of scaling quantum-chemical simulations by enabling multi-GPU inference for long-range interatomic potentials. It achieves this with graph-level partitioning that distributes both atom graphs and three-body bond graphs across GPUs while maintaining zero redundancy, making the approach architecture-agnostic and plug-in ready. The platform is demonstrated on CHGNet, MACE, TensorNet, and eSEN, delivering up to 8× speedups and up to 3.4× capacity improvements, enabling near-million-atom simulations on modest hardware and interval nanosecond timescales. This work could significantly accelerate materials, chemistry, and biophysics research by enabling large-scale, accurate atomistic simulations previously constrained by hardware and parallelization limitations.
Abstract
Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP presents an easy-to-use, flexible, plug-in interface that enables distributed inference of pre-existing MLIPs. We demonstrate DistMLIP on four widely used and state-of-the-art MLIPs: CHGNet, MACE, TensorNet, and eSEN. We show that DistMLIP can simulate atomic systems 3.4x larger and up to 8x faster compared to previous multi-GPU methods. We show that existing foundation potentials can perform near-million-atom calculations at the scale of a few seconds on 8 GPUs with DistMLIP.
