Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day
Jianxiong Li, Boyang Li, Zhuoqiang Guo, Mingzhen Li, Enji Li, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia
TL;DR
This work targets the bottleneck of achieving ab initio molecular dynamics over long timescales by enhancing DeePMD-kit on the Fugaku supercomputer. The authors introduce a node-based parallelization approach that minimizes inter-node communication, complemented by computation- and memory-optimizations (TensorFlow removal, SVE-GEMM, mixed precision, RDMA memory pools, and threadpool execution) and an intra-node load-balancing strategy. Together, these advances yield a 31.7x speedup, reaching up to 149 $ns/day$ for copper and 68.5 $ns/day$ for water on 12,000 nodes, with sustained strong scaling and practical load balance. The results demonstrate the feasibility of millisecond-scale ab initio MD within a week, with broad implications for NNMD and domain-decomposition workloads across HPC systems.
Abstract
Physical phenomena such as chemical reactions, bond breaking, and phase transition require molecular dynamics (MD) simulation with ab initio accuracy ranging from milliseconds to microseconds. However, previous state-of-the-art neural network based MD packages such as DeePMD-kit can only reach 4.7 nanoseconds per day on the Fugaku supercomputer. In this paper, we present a novel node-based parallelization scheme to reduce communication by 81%, then optimize the computationally intensive kernels with sve-gemm and mixed precision. Finally, we implement intra-node load balance to further improve the scalability. Numerical results on the Fugaku supercomputer show that our work has significantly improved the time-to-solution of the DeePMD-kit by a factor of 31.7x, reaching 149 nanoseconds per day on 12,000 computing nodes. This work has opened the door for millisecond simulation with ab initio accuracy within one week for the first time.
