Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC
Nick Brown, Maurice Jamieson
TL;DR
This work assesses the HPC viability of the 64-core Sophon SG2042 RISC-V CPU using NASA’s NAS Parallel Benchmark suite across RISC-V, x86-64, and AArch64 architectures. It demonstrates that the SG2042’s C920 cores provide strong single-core advantages over other RISC-V cores and offers competitive compute-bound performance relative to x86-64/ARM, though memory bandwidth and latency bound many kernels limit its overall HPC competitiveness. The study highlights a memory subsystem bottleneck as the primary constraint and documents varying performance across NPBench kernels and pseudo-applications, with MPI showing advantages over OpenMP for certain configurations. The authors suggest that higher memory bandwidth in future SG generations (e.g., SG2044) and RVV enhancements could substantially improve HPC viability, supporting the SG family’s potential in exascale, energy-efficient computing.
Abstract
Whilst RISC-V has grown phenomenally quickly in embedded computing, it is yet to gain significant traction in High Performance Computing (HPC). However, as we move further into the exascale era, the flexibility offered by RISC-V has the potential to be very beneficial in future supercomputers especially as the community places an increased emphasis on decarbonising its workloads. Sophon's SG2042 is the first mass produced, commodity available, high-core count RISC-V CPU designed for high performance workloads. First released in summer 2023, and at the time of writing now becoming widely available, a key question is whether this is a realistic proposition for HPC applications. In this paper we use NASA's NAS Parallel Benchmark (NPB) suite to characterise performance of the SG2042 against other CPUs implementing the RISC-V, x86-64, and AArch64 ISAs. We find that the SG2042 consistently outperforms all other RISC-V solutions, delivering between a 2.6 and 16.7 performance improvement at the single core level. When compared against the x86-64 and AArch64 CPUs, which are commonplace for high performance workloads, we find that the SG2042 performs comparatively well with computationally bound algorithms but decreases in relative performance when the algorithms are memory bandwidth or latency bound. Based on this work, we identify that performance of the SG2042's memory subsystem is the greatest bottleneck.
