Table of Contents
Fetching ...

RBMD: A molecular dynamics package enabling to simulate 10 million all-atom particles in a single graphics processing unit

Weihang Gao, Teng Zhao, Yongfa Guo, Jiuyang Liang, Huan Liu, Maoying Luo, Zedong Luo, Wei Qin, Yichao Wang, Qi Zhou, Shi Jin, Zhenli Xu

TL;DR

RBMD presents a novel molecular dynamics package that enables all-atom simulations on a single GPU with a CPU core by employing random batch methods for both long-range and short-range nonbonded interactions. The long-range forces are computed via Random Batch Ewald (RBE) without FFT, achieving $O(N)$ complexity, while short-range forces use Random Batch List (RBL) with a core-shell neighbor scheme, reducing memory and computation. Implemented within the VTK-m framework, RBMD leverages GPU-CPU heterogeneity with Map, Reduce, and Reduce-by-Key primitives to achieve high scalability and efficiency, and it demonstrates accuracy comparable to LAMMPS across LJ, electrolyte, and SPC/E water benchmarks. The results indicate substantial speedups over traditional methods and the ability to simulate up to $10^7$ particles on a single GPU, highlighting RBMD’s practical impact for desktop-scale, large-scale MD studies. The work sets the stage for further improvements (e.g., RBSOG, TIP4P, multi-GPU configurations) and expands the accessible scale of all-atom MD to more users and applications.

Abstract

This paper introduces a random-batch molecular dynamics (RBMD) package for fast simulations of particle systems at the nano/micro scale. Different from existing packages, the RBMD uses random batch methods for nonbonded interactions of particle systems. The long-range part of Coulomb interactions is calculated in Fourier space by the random batch Ewald algorithm, which achieves linear complexity and superscalability, surpassing classical lattice-based Ewald methods. For the short-range part, the random batch list algorithm is used to construct neighbor lists, significantly reducing both computational and memory costs. The RBMD is implemented on GPU-CPU heterogeneous architectures, with classical force fields for all-atom systems. Benchmark systems are used to validate accuracy and performance of the package. Comparison with the particle-particle particle-mesh method and the Verlet list method in the LAMMPS package is performed on three different NVIDIA GPUs, demonstrating high efficiency of the RBMD on heterogeneous architectures. Our results also show that the RBMD enables simulations on a single GPU with a CPU core up to 10 million particles. Typically, for systems of one million particles, the RBMD allows simulating all-atom systems with a high efficiency of 8.20 ms per step, demonstrating the attractive feature for running large-scale simulations of practical applications on a desktop machine.

RBMD: A molecular dynamics package enabling to simulate 10 million all-atom particles in a single graphics processing unit

TL;DR

RBMD presents a novel molecular dynamics package that enables all-atom simulations on a single GPU with a CPU core by employing random batch methods for both long-range and short-range nonbonded interactions. The long-range forces are computed via Random Batch Ewald (RBE) without FFT, achieving complexity, while short-range forces use Random Batch List (RBL) with a core-shell neighbor scheme, reducing memory and computation. Implemented within the VTK-m framework, RBMD leverages GPU-CPU heterogeneity with Map, Reduce, and Reduce-by-Key primitives to achieve high scalability and efficiency, and it demonstrates accuracy comparable to LAMMPS across LJ, electrolyte, and SPC/E water benchmarks. The results indicate substantial speedups over traditional methods and the ability to simulate up to particles on a single GPU, highlighting RBMD’s practical impact for desktop-scale, large-scale MD studies. The work sets the stage for further improvements (e.g., RBSOG, TIP4P, multi-GPU configurations) and expands the accessible scale of all-atom MD to more users and applications.

Abstract

This paper introduces a random-batch molecular dynamics (RBMD) package for fast simulations of particle systems at the nano/micro scale. Different from existing packages, the RBMD uses random batch methods for nonbonded interactions of particle systems. The long-range part of Coulomb interactions is calculated in Fourier space by the random batch Ewald algorithm, which achieves linear complexity and superscalability, surpassing classical lattice-based Ewald methods. For the short-range part, the random batch list algorithm is used to construct neighbor lists, significantly reducing both computational and memory costs. The RBMD is implemented on GPU-CPU heterogeneous architectures, with classical force fields for all-atom systems. Benchmark systems are used to validate accuracy and performance of the package. Comparison with the particle-particle particle-mesh method and the Verlet list method in the LAMMPS package is performed on three different NVIDIA GPUs, demonstrating high efficiency of the RBMD on heterogeneous architectures. Our results also show that the RBMD enables simulations on a single GPU with a CPU core up to 10 million particles. Typically, for systems of one million particles, the RBMD allows simulating all-atom systems with a high efficiency of 8.20 ms per step, demonstrating the attractive feature for running large-scale simulations of practical applications on a desktop machine.
Paper Structure (14 sections, 16 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 16 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Simulation flowchart of RBMD. Particularly, in the RBE implementation, $P$ samples are extracted in the CPU and then transferred to the GPU. The Reduce-by-Key operation is employed to simultaneously calculate $P$ structure factors, thereby computing the RBE force.
  • Figure 2: (a) The specific process of CPU-GPU heterogeneous computing for the RBMD. The number of steps is marked in red. Map is the parallel operator. The dark blue square represents "grid", the light blue square is "block" and the grey square is "thread" in GPU; (b) the parallel operator Reduce; and (c) the parallel operator Reduce-by-Key.
  • Figure 3: Results of radial distribution function (RDF), mean squared displacement (MSD), and snapshots produced by OVITO for three benchmark systems: (a-c) the Lennard-Jones fluid; (d-f) the electrolyte solution; and (g-h) the SPC/E water system. The RBMD and LAMMPS are shown to be in good agreement.
  • Figure 4: Wall-clock time for the Lennard-Jones fluid on the Tesla V100 (a), RTX 4090 (b) and A100 (c) architectures. The results calculated by the RBL method in the RBMD and the Verlet list method are present. The results of the acceleration ratio between the RBL algorithm ($p=30$) and the Verlet list method on the Tesla V100, RTX 4090 and A100 architectures are depicted in (d-f), respectively.
  • Figure 5:
  • ...and 3 more figures