Table of Contents
Fetching ...

Proven Distributed Memory Parallelization of Particle Methods

Johannes Pahlke, Ivo F. Sbalzarini

TL;DR

The paper addresses the lack of formal guarantees for distributed-memory parallelization of particle methods by formalizing particle methods as a $7$-tuple framework and proving a cell-list, checkerboard-based distributed scheme is equivalent to the sequential method for pull-interaction particle methods with no global operations. It introduces a distributed state-transition $ ilde{S}$ and shows equivalence to the standard $S$ up to particle ordering, accompanied by explicit assumptions and lemmas (bijective index transforms, order-independence, non-overlapping communications) and time-complexity bounds. The results yield linear scaling in particle number and provide a rigorous basis for provably correct HPC frameworks supporting SPH, MD, DEM, RKPM, and related methods. This work thus offers a solid theoretical foundation for general-purpose, correct parallel implementations of particle methods on distributed-memory architectures, with clear directions for extensions to more general interaction schemes and architectures.

Abstract

We provide a mathematically proven parallelization scheme for particle methods on distributed-memory computer systems. Particle methods are a versatile and widely used class of algorithms for computer simulations and numerical predictions in various applications, ranging from continuum fluid dynamics and granular flows, using methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Methods (DEM) to Molecular Dynamics (MD) simulations in molecular modeling. Particle methods naturally lend themselves to implementation on parallel-computing hardware. So far, however, a mathematical proof of correctness and equivalence to sequential implementations was only available for shared-memory parallelism. Here, we leverage a formal definition of the algorithmic class of particle methods to provide a proven parallelization scheme for distributed-memory computers. We prove that these parallelized particle methods on distributed memory computers are formally equivalent to their sequential counterpart for a well-defined class of particle methods. Notably, the here analyzed parallelization scheme is well-known and commonly used. Our analysis is, therefore, of immediate practical relevance to existing and new parallel software implementations of particle methods and places them on solid theoretical grounds.

Proven Distributed Memory Parallelization of Particle Methods

TL;DR

The paper addresses the lack of formal guarantees for distributed-memory parallelization of particle methods by formalizing particle methods as a -tuple framework and proving a cell-list, checkerboard-based distributed scheme is equivalent to the sequential method for pull-interaction particle methods with no global operations. It introduces a distributed state-transition and shows equivalence to the standard up to particle ordering, accompanied by explicit assumptions and lemmas (bijective index transforms, order-independence, non-overlapping communications) and time-complexity bounds. The results yield linear scaling in particle number and provide a rigorous basis for provably correct HPC frameworks supporting SPH, MD, DEM, RKPM, and related methods. This work thus offers a solid theoretical foundation for general-purpose, correct parallel implementations of particle methods on distributed-memory architectures, with clear directions for extensions to more general interaction schemes and architectures.

Abstract

We provide a mathematically proven parallelization scheme for particle methods on distributed-memory computer systems. Particle methods are a versatile and widely used class of algorithms for computer simulations and numerical predictions in various applications, ranging from continuum fluid dynamics and granular flows, using methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Methods (DEM) to Molecular Dynamics (MD) simulations in molecular modeling. Particle methods naturally lend themselves to implementation on parallel-computing hardware. So far, however, a mathematical proof of correctness and equivalence to sequential implementations was only available for shared-memory parallelism. Here, we leverage a formal definition of the algorithmic class of particle methods to provide a proven parallelization scheme for distributed-memory computers. We prove that these parallelized particle methods on distributed memory computers are formally equivalent to their sequential counterpart for a well-defined class of particle methods. Notably, the here analyzed parallelization scheme is well-known and commonly used. Our analysis is, therefore, of immediate practical relevance to existing and new parallel software implementations of particle methods and places them on solid theoretical grounds.
Paper Structure (10 sections, 8 theorems, 234 equations, 4 figures)

This paper contains 10 sections, 8 theorems, 234 equations, 4 figures.

Key Result

Lemma 1

${}^{\overline{\underline{\mathbf{I}}}} \iota$ and ${}^{\overline{\underline{\mathbf{I}}}} \iota ^{-1}$ are bijections and mutual functional inverses, i.e., ${}(^{\overline{\underline{\mathbf{I}}}} \iota ^{-1})^{-1} = {}^{\overline{\underline{\mathbf{I}}}} \iota$.

Figures (4)

  • Figure 1: Nassi-Shneiderman-Diagram of the state transition function $S$
  • Figure 2: Example for a cell list for one dimension (left) and two dimensions (right). The particles in one dimension are marked as lines and in two dimensions as dots. One process's particle storage overlays the cell list in green and red. The red cell marks the corresponding cell for that storage and its center storage compartment.
  • Figure 3: Nassi-Shneiderman diagram of the distributed-memory parallelization of the outer loop of a particle method with pull interaction. The dashed double lines mark parallel sections of the algorithm.
  • Figure 4: Theoretical Speed-Ups. The constants are chosen to be $d=2$, $C_u=C_{\alpha}=C_{\beta}=C_{\gamma}=1$, $\tau_i=\tau_e=3$, and $\tau_f=\tau_{\overset{_{\circ }}{e}}=1$. For Amdahl's and Gustafson's law $N_{cell}=900$.

Theorems & Definitions (31)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Definition 10
  • ...and 21 more