Table of Contents
Fetching ...

Monte Cimone v2: Down the Road of RISC-V High-Performance Computers

Emanuele Venieri, Simone Manoni, Gabriele Ceccolini, Giacomo Madella, Federico Ficarelli, Daniele Gregori, Daniele Cesarini, Luca Benini, Andrea Bartolini

TL;DR

The paper investigates the readiness of RISC-V for high-performance computing by upgrading the Monte Cimone cluster to MCv2 using the Sophgo SG2042 and evaluating HPC workloads. It documents the hardware architecture, software-stack enhancements (notably BLIS porting and RVV adaptation), and a benchmarking pipeline based on STREAM and HPL to measure memory bandwidth and FP64 performance. The upgrade yields substantial gains, with the MCv2 node achieving about 127 times higher HPL DP FLOP/s and 69 times higher STREAM bandwidth compared to MCv1, and demonstrates that BLIS optimization can match or surpass OpenBLAS at scale on the SG2042. Overall, the work highlights rapid maturation of the SG2042-based RISC-V HPC ecosystem and the practical viability of high-performance libraries on RVV 0.7.1 for datacenter workloads.

Abstract

Many RISC-V (RV) platforms and SoCs have been announced in recent years targeting the HPC sector, but only a few of them are commercially available and engineered to fit the HPC requirements. The Monte Cimone project targeted assessing their capabilities and maturity, aiming to make RISC-V a competitive choice when building a datacenter. Nowadays, Systems-on-chip (SoCs) featuring RV cores with vector extension, form factor and memory capacity suitable for HPC applications are available in the market, but it is unclear how compilers and open-source libraries can take advantage of its performance. In this paper, we describe the performance assessment of the upgrade of the Monte Cimone (MCv2) cluster with the Sophgo SG2042 processor on HPC workloads. Also adding an exploration of BLAS libraries optimization. The upgrade increases the attained node's performance by 127x on HPL DP FLOP/s and 69x on Stream Memory Bandwidth.

Monte Cimone v2: Down the Road of RISC-V High-Performance Computers

TL;DR

The paper investigates the readiness of RISC-V for high-performance computing by upgrading the Monte Cimone cluster to MCv2 using the Sophgo SG2042 and evaluating HPC workloads. It documents the hardware architecture, software-stack enhancements (notably BLIS porting and RVV adaptation), and a benchmarking pipeline based on STREAM and HPL to measure memory bandwidth and FP64 performance. The upgrade yields substantial gains, with the MCv2 node achieving about 127 times higher HPL DP FLOP/s and 69 times higher STREAM bandwidth compared to MCv1, and demonstrates that BLIS optimization can match or surpass OpenBLAS at scale on the SG2042. Overall, the work highlights rapid maturation of the SG2042-based RISC-V HPC ecosystem and the practical viability of high-performance libraries on RVV 0.7.1 for datacenter workloads.

Abstract

Many RISC-V (RV) platforms and SoCs have been announced in recent years targeting the HPC sector, but only a few of them are commercially available and engineered to fit the HPC requirements. The Monte Cimone project targeted assessing their capabilities and maturity, aiming to make RISC-V a competitive choice when building a datacenter. Nowadays, Systems-on-chip (SoCs) featuring RV cores with vector extension, form factor and memory capacity suitable for HPC applications are available in the market, but it is unclear how compilers and open-source libraries can take advantage of its performance. In this paper, we describe the performance assessment of the upgrade of the Monte Cimone (MCv2) cluster with the Sophgo SG2042 processor on HPC workloads. Also adding an exploration of BLAS libraries optimization. The upgrade increases the attained node's performance by 127x on HPL DP FLOP/s and 69x on Stream Memory Bandwidth.

Paper Structure

This paper contains 13 sections, 7 figures.

Figures (7)

  • Figure 1: Monte Cimone v1 (green) + v2 (blue) view
  • Figure 2: Focus point of our micro-kernel optimization
  • Figure 3: STREAM benchmark on a MCv2 node with 64 OpenMP threads compared to a MCv1 node
  • Figure 4: MCv2 HPL w. OpenBLAS (generic & optimized compiling target)
  • Figure 5: HPL on different node's configurations
  • ...and 2 more figures