Table of Contents
Fetching ...

Smoothed Particle Hydrodynamics in pkdgrav3 for Shock Physics Simulations I: Hydrodynamics

Thomas Meier, Douglas Potter, Christian Reinhardt, Joachim Stadel

Abstract

We present pkdgrav3, a high-performance, fully parallel tree-SPH code designed for large-scale hydrodynamic simulations including self-gravity. Building upon the long development history of pkdgrav, the code combines an efficient hierarchical tree algorithm for gravity and neighbor finding with a modern implementation of Smoothed Particle Hydrodynamics (SPH) optimized for massively parallel hybrid CPU/GPU architectures. Its hybrid shared/distributed memory model, combined with an asynchronous communication scheme, allows pkdgrav3 to scale efficiently to thousands of CPU cores and GPUs. We validate the numerical accuracy of pkdgrav3 using a suite of standard tests, demonstrating excellent agreement with analytic or reference solutions. The code was already used in several peer-reviewed publications to model planetary-scale impacts, where SPH's Lagrangian nature allows accurate tracking of material origin and thermodynamic evolution. These examples highlight pkdgrav3's robustness and efficiency in simulating highly dynamical, self-gravitating systems. pkdgrav3 thus provides a powerful, flexible, and scalable platform for astrophysical and planetary applications, capable of exploiting the full potential of modern heterogeneous high-performance computing systems.

Smoothed Particle Hydrodynamics in pkdgrav3 for Shock Physics Simulations I: Hydrodynamics

Abstract

We present pkdgrav3, a high-performance, fully parallel tree-SPH code designed for large-scale hydrodynamic simulations including self-gravity. Building upon the long development history of pkdgrav, the code combines an efficient hierarchical tree algorithm for gravity and neighbor finding with a modern implementation of Smoothed Particle Hydrodynamics (SPH) optimized for massively parallel hybrid CPU/GPU architectures. Its hybrid shared/distributed memory model, combined with an asynchronous communication scheme, allows pkdgrav3 to scale efficiently to thousands of CPU cores and GPUs. We validate the numerical accuracy of pkdgrav3 using a suite of standard tests, demonstrating excellent agreement with analytic or reference solutions. The code was already used in several peer-reviewed publications to model planetary-scale impacts, where SPH's Lagrangian nature allows accurate tracking of material origin and thermodynamic evolution. These examples highlight pkdgrav3's robustness and efficiency in simulating highly dynamical, self-gravitating systems. pkdgrav3 thus provides a powerful, flexible, and scalable platform for astrophysical and planetary applications, capable of exploiting the full potential of modern heterogeneous high-performance computing systems.

Paper Structure

This paper contains 26 sections, 21 equations, 18 figures.

Figures (18)

  • Figure 1: Left: 2D illustration of the particles (gray dots) in the cell bound (black) with their kernel balls (green) and the resulting ball bounds, the box-of-balls in blue and the ball-of-balls in orange. Center: Combining two cells in the tree build results in combined bounds. The original cell bounds are gray solid lines, while the combined cell bound is shown by a black dashed line. The combination of the boxes of balls (solid light blue lines) is shown as a dashed dark blue line, while the combination of the balls of balls (solid orange lines) is shown as a dashed brown line. Right: During the tree walk, the bounds are compared with each other. The bounds of the group for which the tree is currently walked (sink), are shown in black, dark blue and brown, while the gray, light blue and orange colored bounds represent different cells for which their fate is calculated (source). Depending on the overlap, cells are either opened or rejected. An overlap is only counted if both the box-of-balls and ball-of-balls overlap. Regions with only one overlap are colored green, and those with two overlaps are colored red. Because the cells $c_1, c_3$ and $c_5$ have no red regions, they are rejected in all cases. If only gather neighbors are needed, only the cells $c_2, c_6, c_7$ and $c_8$ have red regions in their cell bound and are accepted while the cell $c_4$ is rejected. If both gather and scatter neighbors are needed, the cell $c_4$ is also accepted because it has a red overlap with the cell bound of the sink.
  • Figure 2: Illustration of how a single time step is done (the kick phase is indicated by the yellow box). First, the density update is done for all particles (or the active particles and their neighbors in the case of FastGas, see text). Then, using the time derivatives from the last step, the velocity and internal energy are predicted (green arrows). Using these values, the force calculation, the closing kick of the last step and the opening kick of the new step are done in succession (light blue arrows), asynchronously for each group of active particles. After these updates are done, the positions are drifted using the new velocities (brown arrows). The red arrows imply a flow of information.
  • Figure 3: $L_1$ error norm of the linear traveling sound wave problem. In blue, the body centered cubic grid shows $N^{-0.898}$ scaling with the number of grid cells in the direction of wave propagation, while in orange the primitive cubic grid shows $N^{-0.769}$ scaling. As expected for traditional SPH, the scaling falls short of first-order and becomes dominated by the floating-point precision used by the code above $N=1024$ (see text for discussion).
  • Figure 4: Results of the Sod shock tube test for two different numbers $N$ of particles sampling the $x$-direction of the low-density region. The values are binned in 250 bins. While the low-resolution ($N=128$) still shows the typical SPH features (smoothed discontinuities, pressure blip), the high-resolution ($N=2048$) solution follows the analytical solution closely and the pressure blip is all but gone.
  • Figure 5: Results of the Sedov-Taylor blast wave test for different resolutions. The top frame shows the density while the bottom frame shows the radial velocity. All values are binned in 100 bins, with the marker showing the average value in the bin while the error bars show maximum and minimum values.
  • ...and 13 more figures