Scaling Neural-Network-Based Molecular Dynamics with Long-Range Electrostatic Interactions to 51 Nanoseconds per Day
Jianxiong Li, Beining Zhang, Mingzhen Li, Siyu Hu, Jinzhe Zeng, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia
TL;DR
The paper addresses the bottlenecks in scaling neural-network-based molecular dynamics with long-range electrostatics by optimizing the DPLR framework on Fugaku. It introduces hardware-offloaded 3D FFT (uTofu-FFT), an intra-node overlap strategy that nearly hides PPPM computations, and a ring-based atom-level load balancing scheme, along with node-level task division and a framework-free model-inference pipeline. These contributions yield a 37× speedup over the baseline and enable 51 ns/day for a 564-atom system and 32 ns/day for 403k atoms at large scales, while preserving ab initio accuracy. The results demonstrate strong performance gains on architecture-specific features and offer transferable insights for other NNMD and spatial decomposition workloads.
Abstract
Neural network-based molecular dynamics (NNMD) simulations incorporating long-range electrostatic interactions have significantly extended the applicability to heterogeneous and ionic systems, enabling effective modeling critical physical phenomena such as protein folding and dipolar surface and maintaining ab initio accuracy. However, neural network inference and long-range force computation remain the major bottlenecks, severely limiting simulation speed. In this paper, we target DPLR, a state-of-the-art NNMD package that supports long-range electrostatics, and propose a set of comprehensive optimizations to enhance computational efficiency. We introduce (1) a hardware-offloaded FFT method to reduce the communication overhead; (2) an overlapping strategy that hides long-range force computations using a single core per node, and (3) a ring-based load balancing method that enables atom-level task evenly redistribution with minimal communication overhead. Experimental results on the Fugaku supercomputer show that our work achieves a 37x performance improvement, reaching a maximum simulation speed of 51 ns/day.
