Table of Contents
Fetching ...

Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability

Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko

TL;DR

Frontier-E demonstrates the first exascale cosmological hydrodynamics simulation at trillion-particle scale, combining gravity with detailed gas dynamics in a 15.3 Gly-scale domain. The CRK-HACC framework employs separation-of-scale gravity, a GPU-resident short-range solver, warp splitting, in situ GPU analysis, and multi-tiered I/O to achieve end-to-end throughput on Frontier. The run attains a peak of 513 PFLOPs and sustained 420 PFLOPs while processing 46.6B particles/s and generating over 100 PB of data in about a week, with near-ideal strong and weak scaling. These results establish a new capability baseline for survey-scale predictions, enabling joint multi-wavelength analyses and inviting broader application of the approach to other particle-based simulations and exascale workflows.

Abstract

Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order of magnitude larger than previous efforts. The run achieved 513.1 PFLOPs peak performance, processing 46.6 billion particles per second and writing more than 100 PB of data in just over one week of runtime.

Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability

TL;DR

Frontier-E demonstrates the first exascale cosmological hydrodynamics simulation at trillion-particle scale, combining gravity with detailed gas dynamics in a 15.3 Gly-scale domain. The CRK-HACC framework employs separation-of-scale gravity, a GPU-resident short-range solver, warp splitting, in situ GPU analysis, and multi-tiered I/O to achieve end-to-end throughput on Frontier. The run attains a peak of 513 PFLOPs and sustained 420 PFLOPs while processing 46.6B particles/s and generating over 100 PB of data in about a week, with near-ideal strong and weak scaling. These results establish a new capability baseline for survey-scale predictions, enabling joint multi-wavelength analyses and inviting broader application of the approach to other particle-based simulations and exascale workflows.

Abstract

Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order of magnitude larger than previous efforts. The run achieved 513.1 PFLOPs peak performance, processing 46.6 billion particles per second and writing more than 100 PB of data in just over one week of runtime.

Paper Structure

This paper contains 19 sections, 2 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of large-volume simulations for gravity-only (black markers) and state-of-the-art cosmological hydrodynamics solvers (colored markers). The Frontier-E simulation is the first to break the trillion-particle barrier, reaching the same scale as leading gravity-only counterparts. Resolution Elements refers to the count of dark matter–baryon particle pairs in hydrodynamic simulations, to allow fair comparison with single-species gravity-only runs. The dotted line indicates the particle count required to match the mass resolution of Frontier-E as a function of simulation volume.
  • Figure 2: CRK-HACC architecture diagram of the primary simulation components, spanning gigalight-year volumes down to short-range forces acting on individual particles. The distributed long-range spectral FFT solver operates over the global domain across all nodes (top left). After k-d trees are constructed in chaining mesh bins, the entire overloaded rank is pushed to the GPU (top right), where short-range force operators process leaf-leaf interactions using warp-splitting kernels. Cluster-based in situ analysis is also GPU-accelerated (bottom right). Multi-tier I/O (bottom left) outputs data using synchronous writes to node-local NVMe SSDs, which bleed data to the PFS asynchronously. For Frontier-E, the time-to-solution contributions from the long-range solver, tree build, short-range solver, in situ analysis, and I/O were {1.7%, 1.7%, 79.6%, 11.6%, and 2.6%}, respectively. Over 90% of solver time was executed on the GPU; see Section \ref{['subsec:TTS']}.
  • Figure 3: Slices of total matter density (top panels) and gas temperature (bottom panels) from four ranks of the Frontier-E simulation at high redshift ($z = 9$; early universe, left) and low redshift ($z = 0$; late universe, right). Dashed lines indicate rank boundaries.
  • Figure 4: Strong (left axis, in red) and weak (right axis, in blue) scaling from 128 to 9,000 nodes on Frontier, with the lower panel showing efficiency relative to the ideal case. Weak scaling is presented in terms of the number of particles processed per second by the solver. The Frontier-E problem size is indicated by the star (46.6 billion particles per second), where we achieved 513.1 PFLOPs peak and 420.5 PFLOPs sustained performance.
  • Figure 5: Top: Cumulative time-to-solution of the Frontier-E simulation, along with individual timers for the short- and long-range solvers, I/O, tree construction, and analysis [$\sim$2.8% of the simulation time is in global reductions and miscellaneous software not individually visualized]. Note that redshift decreases non-linearly with cosmic time, so late stages of the simulation span a larger fraction of the Universe’s age. Bottom: NVMe SSD and PFS bandwidth of the multi-tiered I/O strategy, with the gray band bracketing the 0.75 -- 3.75 TB/s PFS bandwidth. The shaded red region tracks the total data written during the Frontier-E run.
  • ...and 1 more figures