Table of Contents
Fetching ...

Radiation Hydrodynamics at Scale: Comparing MPI and Asynchronous Many-Task Runtimes with FleCSI

Alexander Strack, Hartmut Kaiser, Dirk Pflüger

TL;DR

This work benchmarked two applications using FleCSI's three backends on up to 1024 nodes, intending to quantify the advantages and overheads introduced by the AMTR backends.

Abstract

Writing efficient distributed code remains a labor-intensive and complex endeavor. To simplify application development, the Flexible Computational Science Infrastructure (FleCSI) framework offers a user-oriented, high-level programming interface that is built upon a task-based runtime model. Internally, FleCSI integrates state-of-the-art parallelization backends, including MPI and the asynchronous many-task runtimes (AMTRs) Legion and HPX, enabling applications to fully leverage asynchronous parallelism. In this work, we benchmark two applications using FleCSI's three backends on up to 1024 nodes, intending to quantify the advantages and overheads introduced by the AMTR backends. As representative applications, we select a simple Poisson solver and the multidimensional radiation hydrodynamics code HARD. In the communication-focused Poisson solver benchmark, FleCSI achieves over 97% parallel efficiency using the MPI backend under weak scaling on up to 131072 cores, indicating that only minimal overhead is introduced by its abstraction layer. While the Legion backend exhibits notable overheads and scaling limitations, the HPX backend introduces only marginal overhead compared to MPI+Kokkos. However, the scalability of the HPX backend is currently limited due to the usage of non-optimized HPX collective operations. In the computation-focused radiation hydrodynamics benchmarks, the performance gap between the MPI and HPX backends fades. On fewer than 64 nodes, the HPX backend outperforms MPI+Kokkos, achieving an average speedup of 1.31 under weak scaling and up to 1.27 under strong scaling. For the hydrodynamics-only HARD benchmark, the HPX backend demonstrates superior performance on fewer than 32 nodes, achieving speedups of up to 1.20 relative to MPI and up to 1.64 relative to MPI+Kokkos.

Radiation Hydrodynamics at Scale: Comparing MPI and Asynchronous Many-Task Runtimes with FleCSI

TL;DR

This work benchmarked two applications using FleCSI's three backends on up to 1024 nodes, intending to quantify the advantages and overheads introduced by the AMTR backends.

Abstract

Writing efficient distributed code remains a labor-intensive and complex endeavor. To simplify application development, the Flexible Computational Science Infrastructure (FleCSI) framework offers a user-oriented, high-level programming interface that is built upon a task-based runtime model. Internally, FleCSI integrates state-of-the-art parallelization backends, including MPI and the asynchronous many-task runtimes (AMTRs) Legion and HPX, enabling applications to fully leverage asynchronous parallelism. In this work, we benchmark two applications using FleCSI's three backends on up to 1024 nodes, intending to quantify the advantages and overheads introduced by the AMTR backends. As representative applications, we select a simple Poisson solver and the multidimensional radiation hydrodynamics code HARD. In the communication-focused Poisson solver benchmark, FleCSI achieves over 97% parallel efficiency using the MPI backend under weak scaling on up to 131072 cores, indicating that only minimal overhead is introduced by its abstraction layer. While the Legion backend exhibits notable overheads and scaling limitations, the HPX backend introduces only marginal overhead compared to MPI+Kokkos. However, the scalability of the HPX backend is currently limited due to the usage of non-optimized HPX collective operations. In the computation-focused radiation hydrodynamics benchmarks, the performance gap between the MPI and HPX backends fades. On fewer than 64 nodes, the HPX backend outperforms MPI+Kokkos, achieving an average speedup of 1.31 under weak scaling and up to 1.27 under strong scaling. For the hydrodynamics-only HARD benchmark, the HPX backend demonstrates superior performance on fewer than 32 nodes, achieving speedups of up to 1.20 relative to MPI and up to 1.64 relative to MPI+Kokkos.
Paper Structure (13 sections, 4 equations, 7 figures, 1 table)

This paper contains 13 sections, 4 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: FleCSI backend stack
  • Figure 2: Problem size scaling of one Poisson solver iteration on one node. For Legion, the runtimes do not include tracing optimization overhead. Linear scaling is indicated with a dashed line. The run configuration is denoted with $(\#process/\#threads)$.
  • Figure 3: Strong scaling runtimes of one Poisson solver iteration on up to 1024 nodes. The problem size is fixed to $2^{28}$. Ideal single-node performance is indicated with dashed lines. The run configuration per node is denoted with $(\#process/\#threads)$.
  • Figure 4: Weak scaling runtimes of one Poisson solver iteration on up to 1024 nodes. The problem size per node is set to $2^{28}$. Ideal scaling is indicated with a dashed line. The run configuration per node is denoted with $(\#process/\#threads)$.
  • Figure 5: Strong scaling runtimes of one iteration of the three-dimensional radiation hydrodynamics benchmark with HARD on up to 1024 Chicoma nodes (see Figure \ref{['tab:specs']}). The problem size is fixed to $2^{9+9+9}$. Ideal scaling is indicated with a dashed line. The run configuration per node is denoted with $(\#process/\#threads)$.
  • ...and 2 more figures