Table of Contents
Fetching ...

GPU-Native Compressed Neighbor Lists with a Space-Filling-Curve Data Layout

Felix Thaler, Sebastian Keller

TL;DR

A compressed neighbor list for short-range particle-particle interaction based on a space- filling curve (SFC) memory layout and particle clusters, which seamlessly integrates with octree-based domain decomposition and multipole-based methods for long-range gravitational or electrostatic interactions.

Abstract

We have developed a compressed neighbor list for short-range particle-particle interaction based on a space- filling curve (SFC) memory layout and particle clusters. The neighbor list can be constructed efficiently on GPUs, supporting NVIDIA and AMD hardware, and has a memory footprint of only 4 bytes per particle to store approximately 200 neighbors. Compared to the highly-optimized domain-specific neighbor list implementation of GROMACS, a molecular dynamics code, it has a comparable cluster overhead and delivers similar performance in a neighborhood pass. Thanks to the SFC-based data layout and the support for varying interaction radii per particle, our neighbor list performs well for systems with high density contrasts, such as those encountered in many astrophysical and cosmological applications. Due to the close relation between SFCs and octrees, our neighbor list seamlessly integrates with octree-based domain decomposition and multipole-based methods for long-range gravitational or electrostatic interactions. To demonstrate the coupling between long- and short-range forces, we simulate an Evrard collapse, a standard test case for the coupling between hydrodynamical and gravitational forces, on up to 1024 GPUs, and compare our results to the analytical solution.

GPU-Native Compressed Neighbor Lists with a Space-Filling-Curve Data Layout

TL;DR

A compressed neighbor list for short-range particle-particle interaction based on a space- filling curve (SFC) memory layout and particle clusters, which seamlessly integrates with octree-based domain decomposition and multipole-based methods for long-range gravitational or electrostatic interactions.

Abstract

We have developed a compressed neighbor list for short-range particle-particle interaction based on a space- filling curve (SFC) memory layout and particle clusters. The neighbor list can be constructed efficiently on GPUs, supporting NVIDIA and AMD hardware, and has a memory footprint of only 4 bytes per particle to store approximately 200 neighbors. Compared to the highly-optimized domain-specific neighbor list implementation of GROMACS, a molecular dynamics code, it has a comparable cluster overhead and delivers similar performance in a neighborhood pass. Thanks to the SFC-based data layout and the support for varying interaction radii per particle, our neighbor list performs well for systems with high density contrasts, such as those encountered in many astrophysical and cosmological applications. Due to the close relation between SFCs and octrees, our neighbor list seamlessly integrates with octree-based domain decomposition and multipole-based methods for long-range gravitational or electrostatic interactions. To demonstrate the coupling between long- and short-range forces, we simulate an Evrard collapse, a standard test case for the coupling between hydrodynamical and gravitational forces, on up to 1024 GPUs, and compare our results to the analytical solution.
Paper Structure (14 sections, 11 figures, 3 tables)

This paper contains 14 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The data arrays used by GROMACS. Arrows indicate references to other data.
  • Figure 2: The data arrays used in our implementation. Arrows indicate references to other data.
  • Figure 3: Cluster overhead of GROMACS’ clustering implementation, as presented in pall_flexible_2013, vs. cluster overhead of our SFC-based approach. Number density $\rho=100\,\mathrm{nm}^{-3}$.
  • Figure 4: Performance of our pair interaction kernel for Lennard-Jones forces compared to LAMMPS and GROMACS, on the NVIDIA GH200noauthor_grace_2025.
  • Figure 5: Performance of our pair interaction kernel for Lennard-Jones forces compared to LAMMPS and GROMACS, on the AMD MI300Anoauthor_amd_2025.
  • ...and 6 more figures