Preparing for HPC on RISC-V: Examining Vectorization and Distributed Performance of an Astrophyiscs Application with HPX and Kokkos
Patrick Diehl, Panagiotis Syskakis, Gregor Daiß, Steven R. Brandt, Alireza Kheirkhahan, Srinivas Yadav Singanaboina, Dominic Marcello, Chris Taylor, John Leidel, Hartmut Kaiser
TL;DR
This work assesses the viability of HPC on desktop-grade RISC-V hardware by porting the astrophysics code Octo-Tiger to a RISC-V+HPX+Kokkos stack and introducing a RISC-V RVV backend for std::experimental::simd. It demonstrates how RVV vectorization and HPX/Kokkos integration enable scalable performance on a two-node MILK-V Pioneer cluster and a Banana Pi board, with cross-comparisons to the A64FX-based Fugaku system. Key contributions include a practical RVV library implementation, targeted HPX optimizations for RISC-V atomics, and detailed node- and distributed-scale performance and power measurements across multiple real-world astrophysical scenarios (DWD, v1309). The results indicate that RISC-V hardware can approach or exceed certain performance metrics of contemporary ARM-based HPC nodes while offering lower power consumption, supporting cautious optimism for RISC-V as a viable HPC platform and guiding future heterogeneous and vector-enabled developments.
Abstract
In recent years, interest in RISC-V computing architectures has moved from academic to mainstream, especially in the field of High Performance Computing where energy limitations are increasingly a concern. As of this year, the first single board RISC-V CPUs implementing the finalized ratified vector specification are being released. The RISC-V vector specification follows in the tradition of vector processors found in the CDC STAR-100, the Cray-1, the Convex C-Series, and the NEC SX machines and accelerators. The family of vector processors offers support for variable-length array processing as opposed to the fixed-length processing functionality offered by SIMD. Vector processors offer opportunities to perform vector-chaining which allows temporary results to be used without the need to resolve memory references. In this work, we use the Octo-Tiger multi-physics, multi-scale, 3D adaptive mesh refinement astrophysics application to study these early RISC-V chips with vector machine support. We report on our experience in porting this modern C++ code (which is built upon several open-source libraries such as HPX and Kokkos) to RISC-V. In addition, we show the impact of the RISC-V Vector extension on a RISC-V single board computer by implementing the std::experimental:simd interface and integrating it with our code. We also compare the application's performance, scalability, and power consumption on desktop-grade RISC-V computer to an A64FX system.
