Table of Contents
Fetching ...

Breaking the mold: overcoming the time constraints of molecular dynamics on general-purpose hardware

Danny Perez, Aidan Thompson, Stan Moore, Tomas Oppelstrup, Ilya Sharapov, Kylee Santos, Amirali Sharifian, Delyan Z. Kalchev, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros, Michael James, Sivasankaran Rajamanickam

TL;DR

The paper addresses the long-standing limitation of MD to access millisecond timescales on general-purpose hardware. It introduces a fully programmable Cerebras Wafer-Scale Engine (WSE) architecture and an Embedded Atom Method (EAM) MD implementation that exploits massive on-chip parallelism and ultra-low-latency, fine-grained communication to achieve unprecedented simulation rates, up to $1.144\times 10^6$ steps/s for ~ $2\times 10^5$ atoms. By comparing against Frontier and prior bespoke accelerators, the work demonstrates that such wafer-scale hardware can access long-timescale, atomistic dynamics directly, dramatically expanding the MD simulation space. The approach has broad implications for materials science, enabling direct study of slow diffusive processes and microstructural evolution, with potential extensions to other potentials and multi-WSE clusters to scale system size and time horizons.

Abstract

The evolution of molecular dynamics (MD) simulations has been intimately linked to that of computing hardware. For decades following the creation of MD, simulations have improved with computing power along the three principal dimensions of accuracy, atom count (spatial scale), and duration (temporal scale). Since the mid-2000s, computer platforms have however failed to provide strong scaling for MD as scale-out CPU and GPU platforms that provide substantial increases to spatial scale do not lead to proportional increases in temporal scale. Important scientific problems therefore remained inaccessible to direct simulation, prompting the development of increasingly sophisticated algorithms that present significant complexity, accuracy, and efficiency challenges. While bespoke MD-only hardware solutions have provided a path to longer timescales for specific physical systems, their impact on the broader community has been mitigated by their limited adaptability to new methods and potentials. In this work, we show that a novel computing architecture, the Cerebras Wafer Scale Engine, completely alters the scaling path by delivering unprecedentedly high simulation rates up to 1.144M steps/second for 200,000 atoms whose interactions are described by an Embedded Atom Method potential. This enables direct simulations of the evolution of materials using general-purpose programmable hardware over millisecond timescales, dramatically increasing the space of direct MD simulations that can be carried out.

Breaking the mold: overcoming the time constraints of molecular dynamics on general-purpose hardware

TL;DR

The paper addresses the long-standing limitation of MD to access millisecond timescales on general-purpose hardware. It introduces a fully programmable Cerebras Wafer-Scale Engine (WSE) architecture and an Embedded Atom Method (EAM) MD implementation that exploits massive on-chip parallelism and ultra-low-latency, fine-grained communication to achieve unprecedented simulation rates, up to steps/s for ~ atoms. By comparing against Frontier and prior bespoke accelerators, the work demonstrates that such wafer-scale hardware can access long-timescale, atomistic dynamics directly, dramatically expanding the MD simulation space. The approach has broad implications for materials science, enabling direct study of slow diffusive processes and microstructural evolution, with potential extensions to other potentials and multi-WSE clusters to scale system size and time horizons.

Abstract

The evolution of molecular dynamics (MD) simulations has been intimately linked to that of computing hardware. For decades following the creation of MD, simulations have improved with computing power along the three principal dimensions of accuracy, atom count (spatial scale), and duration (temporal scale). Since the mid-2000s, computer platforms have however failed to provide strong scaling for MD as scale-out CPU and GPU platforms that provide substantial increases to spatial scale do not lead to proportional increases in temporal scale. Important scientific problems therefore remained inaccessible to direct simulation, prompting the development of increasingly sophisticated algorithms that present significant complexity, accuracy, and efficiency challenges. While bespoke MD-only hardware solutions have provided a path to longer timescales for specific physical systems, their impact on the broader community has been mitigated by their limited adaptability to new methods and potentials. In this work, we show that a novel computing architecture, the Cerebras Wafer Scale Engine, completely alters the scaling path by delivering unprecedentedly high simulation rates up to 1.144M steps/second for 200,000 atoms whose interactions are described by an Embedded Atom Method potential. This enables direct simulations of the evolution of materials using general-purpose programmable hardware over millisecond timescales, dramatically increasing the space of direct MD simulations that can be carried out.

Paper Structure

This paper contains 7 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: (a) Progression of maximum simulation speed over six decades. Historically prominent benchmarks for several classes of interatomic interaction model are plotted versus time: Lennard-Jones (LJ, black), biomolecular (Bio, red) and Embedded Atom Method (EAM, green). Benchmarks for all three model classes have stagnated since 2010. Exceptions to this are the Anton special-purpose ASIC supercomputers (red/black) and now the Cerebras wafer-scale engine (green/black). Inset: Historical clock frequency data for CPU processors (blue)cpudb, (red)cpuwikipedia and GPU processors (green)gpgpuwikipedia. (b) An expanded view of Anton and Cerebras performance (Msteps/s, linear scale) during the time period of the dashed rectangle in (a). More details on each of these benchmarks are provided in Table \ref{['tab:history']}.
  • Figure 2: Optimizations of communication patterns that enable fine-grain parallelism. (a) T-shaped communication pattern uses multicast along three one-dimensional segments. (b) Midpoint core handling of interactions between two atoms. Black cores contain atoms, while gray cores are dedicated to processing atom interactions.
  • Figure 3: Accessible simulation space, assuming 24 hours of run time with 1 fs timestep size for EAM models running on Fugaku (black), Frontier (blue), and Cerebras (green/black), and a biomolecular model on the 64 node Anton-3 supercomputer (red/black). For Frontier, we assume that the small-scale benchmarks (obtained on 128 nodes) can be weak-scaled to the full machine (9,408 nodes). We also show the performance of small atom counts on a single Frontier MI250X GCD. Dashed lines indicate upper bounds on performance for Frontier and MI250X in the limits of ideal scaling (diagonal) and maximum speed (vertical). Inset: Enlarged view of the Anton-3 and Cerebras points within the dashed rectangle.