Advances in ArborX to support exascale applications
Andrey Prokopenko, Daniel Arndt, Damien Lebrun-Grandié, Bruno Turcksin, Nicholas Frontiere, J. D. Emberson, Michael Buehlmann
TL;DR
ArborX, a performance-portable geometric search library, is applied to exascale cosmology with HACC to address in-situ halo finding via DBSCAN. The authors introduce extensive interface and algorithmic improvements (stackless traversal, 64-bit Morton codes, pair traversal, callbacks, and early termination) and develop DBSCAN variants (Initial, Reformulated, FDBSCAN, FDBSCAN-DenseBox) that deliver up to 10–12× speedups for FOF and ~2× improvements in the full time-stepper on Summit, with a 37M-point benchmark clustering in 0.15 s on an NVIDIA A100. These advances enable in-situ substructure finding during production runs and demonstrate strong performance portability across GPU vendors. The work sets the stage for further enhancements, including Hdbscan*, a SYCL port for Aurora, and auto-tuning interfaces, highlighting substantial practical impact for large-scale cosmology simulations.
Abstract
ArborX is a performance portable geometric search library developed as part of the Exascale Computing Project (ECP). In this paper, we explore a collaboration between ArborX and a cosmological simulation code HACC. Large cosmological simulations on exascale platforms encounter a bottleneck due to the in-situ analysis requirements of halo finding, a problem of identifying dense clusters of dark matter (halos). This problem is solved by using a density-based DBSCAN clustering algorithm. With each MPI rank handling hundreds of millions of particles, it is imperative for the DBSCAN implementation to be efficient. In addition, the requirement to support exascale supercomputers from different vendors necessitates performance portability of the algorithm. We describe how this challenge problem guided ArborX development, and enhanced the performance and the scope of the library. We explore the improvements in the basic algorithms for the underlying search index to improve the performance, and describe several implementations of DBSCAN in ArborX. Further, we report the history of the changes in ArborX and their effect on the time to solve a representative benchmark problem, as well as demonstrate the real world impact on production end-to-end cosmology simulations.
