Table of Contents
Fetching ...

The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs

Timothée David--Cléris, Guillaume Laibe, Yona Lapeyre

TL;DR

Shamrock introduces a performance-portable, SYCL-based framework for Smoothed Particle Hydrodynamics on Exascale architectures, centering on a fully parallel binary tree for on-the-fly neighbor searches and ghost-zone management. The SPH solver mirrors Phantom but replaces KD-trees with a radix/tree approach, enabling negligible tree construction time and scalable multi-GPU execution. Extensive standard hydrodynamics tests and circumbinary disc simulations demonstrate accuracy on par with Phantom while achieving high throughput; single-GPU throughput reaches tens of millions of particles per second and multi-GPU weak scaling approaches $92\%$ efficiency on thousands of GPUs. The results establish Shamrock as a promising platform for high-resolution astrophysical simulations and a flexible foundation for integrating gravity and multi-physics methods, with strong potential for further performance optimizations and broader applicability across grid- and particle-based solvers.

Abstract

We present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures. The core of Shamrock is an accelerated parallel tree with negligible construction time, whose efficiency is based on binary algebra. The Smoothed Particle Hydrodynamics algorithm of the Phantom code is implemented in Shamrock. On-the-fly tree construction circumvents the necessity for extensive data communications. In tests displaying a uniform density with global timesteping with tens of billions of particles, Shamrock completes a single time step in a few seconds using over the thousand of GPUs of a super-computer. This corresponds to processing billions of particles per second, with tens of millions of particles per GPU. The parallel efficiency across the entire cluster is larger than $\sim 90\%$.

The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs

TL;DR

Shamrock introduces a performance-portable, SYCL-based framework for Smoothed Particle Hydrodynamics on Exascale architectures, centering on a fully parallel binary tree for on-the-fly neighbor searches and ghost-zone management. The SPH solver mirrors Phantom but replaces KD-trees with a radix/tree approach, enabling negligible tree construction time and scalable multi-GPU execution. Extensive standard hydrodynamics tests and circumbinary disc simulations demonstrate accuracy on par with Phantom while achieving high throughput; single-GPU throughput reaches tens of millions of particles per second and multi-GPU weak scaling approaches efficiency on thousands of GPUs. The results establish Shamrock as a promising platform for high-resolution astrophysical simulations and a flexible foundation for integrating gravity and multi-physics methods, with strong potential for further performance optimizations and broader applicability across grid- and particle-based solvers.

Abstract

We present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures. The core of Shamrock is an accelerated parallel tree with negligible construction time, whose efficiency is based on binary algebra. The Smoothed Particle Hydrodynamics algorithm of the Phantom code is implemented in Shamrock. On-the-fly tree construction circumvents the necessity for extensive data communications. In tests displaying a uniform density with global timesteping with tens of billions of particles, Shamrock completes a single time step in a few seconds using over the thousand of GPUs of a super-computer. This corresponds to processing billions of particles per second, with tens of millions of particles per GPU. The parallel efficiency across the entire cluster is larger than .

Paper Structure

This paper contains 107 sections, 34 equations, 29 figures, 2 tables, 7 algorithms.

Figures (29)

  • Figure 1: An example of a tree structure for particle sets and its use in neighbour search: (a) shows the top layer with the particles in the simulation domain, recursively subdivided into 4 subdomains. The particle of interest is highlighted in yellow, domains intersecting the interaction radius in green, and non-intersecting domains in red. (b) illustrates the hierarchical structure of the same tree.
  • Figure 2: Numerical integration of an hydrodynamic quantity $\mathbb{U}$ involves finding neighbours $i$ (particles, cells), then adding their contributions according to the chosen solver $F$.
  • Figure 3: Internal structure of Shamrock: functionalities for calculating neighbour finding are organised in different layers of abstraction, enabling the independent treatment of any numerical scheme (Models).
  • Figure 4: Advection of a density step across several traversal of a periodic box, in code units. SPH being Galilean invariant, the results (black dots) precisely match the initial setup (red crosses) down to machine precision, thus validating the boundary treatment in Shamrock.
  • Figure 5: Result obtained for the Sod-tube test by juxtaposing two tubes of $24\times24\times512$ particles in $x\in[-0.5,0.5]$ and $12\times12\times256$ particles in $x\in[0.5,1.5]$ organised in hexagonal compact packing lattices. The density is set to $\rho=1$ in $x\in[-0.5,0.5]$ and $\rho=0.125$ in $x \in [0.5,1.5]$. Initial pressure is $P=1$ for $x\in[-0.5,0.5]$ and $P=0.1$ for $x\in[0.5,1.5]$, with zero initial velocities. An adiabatic equation of states with $\gamma=1.4$ is used. Boundaries are periodic, and only half of the simulation is displayed. The simulation is performed until $t=0.245$, and numerical results are compared against the analytic solution. We additionally show the values of the shock viscosity parameter $\alpha$.
  • ...and 24 more figures