Table of Contents
Fetching ...

Advances in ArborX to support exascale applications

Andrey Prokopenko, Daniel Arndt, Damien Lebrun-Grandié, Bruno Turcksin, Nicholas Frontiere, J. D. Emberson, Michael Buehlmann

TL;DR

ArborX, a performance-portable geometric search library, is applied to exascale cosmology with HACC to address in-situ halo finding via DBSCAN. The authors introduce extensive interface and algorithmic improvements (stackless traversal, 64-bit Morton codes, pair traversal, callbacks, and early termination) and develop DBSCAN variants (Initial, Reformulated, FDBSCAN, FDBSCAN-DenseBox) that deliver up to 10–12× speedups for FOF and ~2× improvements in the full time-stepper on Summit, with a 37M-point benchmark clustering in 0.15 s on an NVIDIA A100. These advances enable in-situ substructure finding during production runs and demonstrate strong performance portability across GPU vendors. The work sets the stage for further enhancements, including Hdbscan*, a SYCL port for Aurora, and auto-tuning interfaces, highlighting substantial practical impact for large-scale cosmology simulations.

Abstract

ArborX is a performance portable geometric search library developed as part of the Exascale Computing Project (ECP). In this paper, we explore a collaboration between ArborX and a cosmological simulation code HACC. Large cosmological simulations on exascale platforms encounter a bottleneck due to the in-situ analysis requirements of halo finding, a problem of identifying dense clusters of dark matter (halos). This problem is solved by using a density-based DBSCAN clustering algorithm. With each MPI rank handling hundreds of millions of particles, it is imperative for the DBSCAN implementation to be efficient. In addition, the requirement to support exascale supercomputers from different vendors necessitates performance portability of the algorithm. We describe how this challenge problem guided ArborX development, and enhanced the performance and the scope of the library. We explore the improvements in the basic algorithms for the underlying search index to improve the performance, and describe several implementations of DBSCAN in ArborX. Further, we report the history of the changes in ArborX and their effect on the time to solve a representative benchmark problem, as well as demonstrate the real world impact on production end-to-end cosmology simulations.

Advances in ArborX to support exascale applications

TL;DR

ArborX, a performance-portable geometric search library, is applied to exascale cosmology with HACC to address in-situ halo finding via DBSCAN. The authors introduce extensive interface and algorithmic improvements (stackless traversal, 64-bit Morton codes, pair traversal, callbacks, and early termination) and develop DBSCAN variants (Initial, Reformulated, FDBSCAN, FDBSCAN-DenseBox) that deliver up to 10–12× speedups for FOF and ~2× improvements in the full time-stepper on Summit, with a 37M-point benchmark clustering in 0.15 s on an NVIDIA A100. These advances enable in-situ substructure finding during production runs and demonstrate strong performance portability across GPU vendors. The work sets the stage for further enhancements, including Hdbscan*, a SYCL port for Aurora, and auto-tuning interfaces, highlighting substantial practical impact for large-scale cosmology simulations.

Abstract

ArborX is a performance portable geometric search library developed as part of the Exascale Computing Project (ECP). In this paper, we explore a collaboration between ArborX and a cosmological simulation code HACC. Large cosmological simulations on exascale platforms encounter a bottleneck due to the in-situ analysis requirements of halo finding, a problem of identifying dense clusters of dark matter (halos). This problem is solved by using a density-based DBSCAN clustering algorithm. With each MPI rank handling hundreds of millions of particles, it is imperative for the DBSCAN implementation to be efficient. In addition, the requirement to support exascale supercomputers from different vendors necessitates performance portability of the algorithm. We describe how this challenge problem guided ArborX development, and enhanced the performance and the scope of the library. We explore the improvements in the basic algorithms for the underlying search index to improve the performance, and describe several implementations of DBSCAN in ArborX. Further, we report the history of the changes in ArborX and their effect on the time to solve a representative benchmark problem, as well as demonstrate the real world impact on production end-to-end cosmology simulations.
Paper Structure (19 sections, 10 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Visualization of the performance impact of ArborX on analysis steps for a production gravity-only cosmology simulation.
  • Figure 2: Visualization of in-situ substructure finding of a large particle cluster from a hydrodynamic simulation using DBSCAN. Image credit: Azton Wells, Argonne National Laboratory.
  • Figure 3: Classification of points for Dbscan with $\textit{minPts}\xspace = 4$. Core points are shown in red, border in blue, and noise are in gray.
  • Figure 4: Benchmark problem data sampled from a single rank. The clusters are clearly formed.
  • Figure 5: Callbacks interface. The return type RT could either be void, or enum CallbackTreeTraversalControl. The latter affects the traversal, allowing early termination (see \ref{['s:early_termination']}).
  • ...and 5 more figures