RBMD: A molecular dynamics package enabling to simulate 10 million all-atom particles in a single graphics processing unit
Weihang Gao, Teng Zhao, Yongfa Guo, Jiuyang Liang, Huan Liu, Maoying Luo, Zedong Luo, Wei Qin, Yichao Wang, Qi Zhou, Shi Jin, Zhenli Xu
TL;DR
RBMD presents a novel molecular dynamics package that enables all-atom simulations on a single GPU with a CPU core by employing random batch methods for both long-range and short-range nonbonded interactions. The long-range forces are computed via Random Batch Ewald (RBE) without FFT, achieving $O(N)$ complexity, while short-range forces use Random Batch List (RBL) with a core-shell neighbor scheme, reducing memory and computation. Implemented within the VTK-m framework, RBMD leverages GPU-CPU heterogeneity with Map, Reduce, and Reduce-by-Key primitives to achieve high scalability and efficiency, and it demonstrates accuracy comparable to LAMMPS across LJ, electrolyte, and SPC/E water benchmarks. The results indicate substantial speedups over traditional methods and the ability to simulate up to $10^7$ particles on a single GPU, highlighting RBMD’s practical impact for desktop-scale, large-scale MD studies. The work sets the stage for further improvements (e.g., RBSOG, TIP4P, multi-GPU configurations) and expands the accessible scale of all-atom MD to more users and applications.
Abstract
This paper introduces a random-batch molecular dynamics (RBMD) package for fast simulations of particle systems at the nano/micro scale. Different from existing packages, the RBMD uses random batch methods for nonbonded interactions of particle systems. The long-range part of Coulomb interactions is calculated in Fourier space by the random batch Ewald algorithm, which achieves linear complexity and superscalability, surpassing classical lattice-based Ewald methods. For the short-range part, the random batch list algorithm is used to construct neighbor lists, significantly reducing both computational and memory costs. The RBMD is implemented on GPU-CPU heterogeneous architectures, with classical force fields for all-atom systems. Benchmark systems are used to validate accuracy and performance of the package. Comparison with the particle-particle particle-mesh method and the Verlet list method in the LAMMPS package is performed on three different NVIDIA GPUs, demonstrating high efficiency of the RBMD on heterogeneous architectures. Our results also show that the RBMD enables simulations on a single GPU with a CPU core up to 10 million particles. Typically, for systems of one million particles, the RBMD allows simulating all-atom systems with a high efficiency of 8.20 ms per step, demonstrating the attractive feature for running large-scale simulations of practical applications on a desktop machine.
