Table of Contents
Fetching ...

A novel algorithm for GPU-accelerated particle-mesh interactions implemented in the QUOKKA code

Chong-Chong He, Benjamin D. Wibking, Aditi Vijayan, Mark R. Krumholz, Pak Shing Li

TL;DR

The paper tackles the challenge of coupling particle-based processes (growth, feedback) with grid-based hydrodynamics on massively parallel GPUs, where traditional neighbor searches and cross-node particle handling are costly. It introduces the particle-mesh-particle (PM-P) algorithm, which deposits particle effects onto a buffer mesh with ghost zones, sums contributions across distributed ranks, applies cell-wise limiters, and then updates both mesh and particle properties in a conservation-respecting manner, requiring only a single inter-processor communication per cycle. Implemented in the QUOKKA code, the method is demonstrated through sink-particle accretion and supernova feedback tests, showing convergence of terminal radial momentum for SN feedback and good agreement with analytic Bondi accretion results for sinks, across multiple resolutions. The PM-P approach offers scalable, bitwise-reproducible particle–mesh coupling for GPU-optimized astrophysical simulations and is extensible to additional feedback processes, marking a significant step toward high-fidelity, large-scale star-formation and feedback modeling on modern hardware.

Abstract

We present a novel, GPU-optimized algorithm for particle-mesh interactions in grid-based hydrodynamics simulations, designed for massively parallel architectures. This approach overcomes the inefficiency of particle neighbour searches or sorts across multiple GPU nodes by using a new ``particle-mesh-particle'' interaction scheme, which extends the particle-mesh method for self-gravity. The algorithm proceeds in two main stages: first, quantities exchanged between particles and the mesh -- such as mass, energy, and momentum added by stellar feedback or removed by accretion onto a sink -- are deposited into a buffer mesh equipped with ghost zones, where multiple contributions per cell are accumulated using atomic additions and then communicated across distributed memory ranks. In the second stage, the buffer states are applied to real mesh states, incorporating cell-wise limiters to enforce physical constraints such as positive density. We implement this scheme in the GPU-native radiation-magnetohydrodynamics code QUOKKA, demonstrating its application to both supernova feedback and sink particle accretion. We demonstrate that the former scheme converges in the terminal radial momentum from multiple supernovae across varying spatial resolutions, while for the latter simulations of accretion in several configurations show excellent agreement with analytic solutions. This scheme enables efficient, scalable particle-mesh coupling for GPU-optimized simulations.

A novel algorithm for GPU-accelerated particle-mesh interactions implemented in the QUOKKA code

TL;DR

The paper tackles the challenge of coupling particle-based processes (growth, feedback) with grid-based hydrodynamics on massively parallel GPUs, where traditional neighbor searches and cross-node particle handling are costly. It introduces the particle-mesh-particle (PM-P) algorithm, which deposits particle effects onto a buffer mesh with ghost zones, sums contributions across distributed ranks, applies cell-wise limiters, and then updates both mesh and particle properties in a conservation-respecting manner, requiring only a single inter-processor communication per cycle. Implemented in the QUOKKA code, the method is demonstrated through sink-particle accretion and supernova feedback tests, showing convergence of terminal radial momentum for SN feedback and good agreement with analytic Bondi accretion results for sinks, across multiple resolutions. The PM-P approach offers scalable, bitwise-reproducible particle–mesh coupling for GPU-optimized astrophysical simulations and is extensible to additional feedback processes, marking a significant step toward high-fidelity, large-scale star-formation and feedback modeling on modern hardware.

Abstract

We present a novel, GPU-optimized algorithm for particle-mesh interactions in grid-based hydrodynamics simulations, designed for massively parallel architectures. This approach overcomes the inefficiency of particle neighbour searches or sorts across multiple GPU nodes by using a new ``particle-mesh-particle'' interaction scheme, which extends the particle-mesh method for self-gravity. The algorithm proceeds in two main stages: first, quantities exchanged between particles and the mesh -- such as mass, energy, and momentum added by stellar feedback or removed by accretion onto a sink -- are deposited into a buffer mesh equipped with ghost zones, where multiple contributions per cell are accumulated using atomic additions and then communicated across distributed memory ranks. In the second stage, the buffer states are applied to real mesh states, incorporating cell-wise limiters to enforce physical constraints such as positive density. We implement this scheme in the GPU-native radiation-magnetohydrodynamics code QUOKKA, demonstrating its application to both supernova feedback and sink particle accretion. We demonstrate that the former scheme converges in the terminal radial momentum from multiple supernovae across varying spatial resolutions, while for the latter simulations of accretion in several configurations show excellent agreement with analytic solutions. This scheme enables efficient, scalable particle-mesh coupling for GPU-optimized simulations.

Paper Structure

This paper contains 24 sections, 26 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Summary of the steps in the particle-mesh-particle algorithm for particle-gas interactions on GPU. In the example shown, there are two particles (red stars) located near the boundaries of two grids stored in memory on different GPUs; the thick black line indicates the boundary. Blue colouring shows real cells, while green indicates the buffer. Cells with dashed edges are ghost zones used for communication; we show only one ghost zone in this schematic for compactness, but real calculations use a larger number.
  • Figure 2: Isothermal sphere test. The top two panels show the mean density and radial velocity as a function of radius in the simulation (dashed green line) compared to the analytic solution (blue solid line). The lower panel shows the sink particle mass versus time, again comparing the simulation and exact analytic solutions.
  • Figure 3: Comparison of simulated accretion rates over time with theoretical values for the Bondi accretion tests presented in \ref{['ssec:bondi']}. The ratios of the Bondi radius to spatial resolution are, from left to right and top to bottom, 0.1, 0.32, 1.0, 3.16, and 10. The dashed horizontal line denotes the theoretical accretion rate, and the solid lines show the accretion rate measured in the simulations. The horizontal axis gives time in units of the Bondi time.
  • Figure 4: Top: Accretion rate versus time for the Bondi-Hoyle accretion test described in \ref{['ssec:bh']}. The accretion rate is normalized to the Bondi-Hoyle accretion rate, and the time is normalized to Bondi-Hoyle time. Bottom: Snapshot of the density field in the $xy$-plane at $t \approx ~6 t_{\rm BH}$. White squares indicate the borders of the AMR levels.
  • Figure 5: Tests of SN feedback prescriptions using simulations of radiative SNR evolution in a uniform medium with spatial resolution $\Delta x = 4$ pc, both using our fiducial prescription (orange) and a scheme that deposits thermal energy only (blue). The top panel shows the final radial momentum, while the bottom shows the maximum temperature attained during the simulation, both shown as functions of the ambient hydrogen number density. The grey stripe in the upper panel indicates the empirical $p_{\rm r,max} - n_{\rm H, amb}$ relation with $25\%$ tolerance. The upper axis shows the ratio of kernel radius ($=3\Delta x$) to the shell-formation radius, and the background shading indicates which case of our fiducial SN deposition scheme is used for each simulation given that ratio.
  • ...and 2 more figures