High-performance training and inference for deep equivariant interatomic potentials
Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky, Albert Musaelian
TL;DR
The paper tackles the scalability and performance bottlenecks of deep equivariant interatomic potentials by overhauling the NequIP framework for multi-node training and fast inference. It combines PyTorch 2.0 TorchInductor for end-to-end train-time compilation, a custom distributed data-parallel scheme, and Ahead-of-Time Inductor (AOTI) for efficient deployment in HPC codes, augmented by a fused Triton tensor-product kernel. In a SPICE 2 case study training Allegro models, the approach yields 2.4–5× training speedups and 4–18× inference speedups, enabling large-scale MD simulations with improved memory efficiency. The work delivers an extensible, HPC-ready platform for MLIPs that can accelerate materials discovery and biomolecular simulations through scalable, hardware-aware training and deployment.
Abstract
Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.
