Is RISC-V Ready for Machine Learning? Portable Gaussian Processes Using Asynchronous Tasks

Alexander Strack; Patrick Diehl; Dirk Pflüger

Is RISC-V Ready for Machine Learning? Portable Gaussian Processes Using Asynchronous Tasks

Alexander Strack, Patrick Diehl, Dirk Pflüger

Abstract

Gaussian processes are widely used in machine learning domains but remain computationally demanding, limiting their efficient scalability across diverse hardware platforms. The GPRat library targets these challenges with the help of the asynchronous many-task runtime system HPX. In this work, we extend GPRat to enable portability across multiple hardware architectures and evaluate its performance on representative x86-64, ARM, and RISC-V chips. We conduct node-level strong-scaling and problem-size-scaling benchmarks for Gaussian Process prediction and hyperparameter optimization to assess single-core performance, parallel scalability, and architectural efficiency. Our results show that while the x86-64 Zen 2 chip achieves a 58% single-core performance advantage over the ARM-based Fujitsu A64FX, superior parallel scaling allows the 48-core ARM chip to outperform the 64-core Zen 2 by 9% at full node utilization. The evaluated SOPHON SG2042 RISC-V chip exhibits substantially lower performance and weaker scalability, with single-core performance lagging by up to a factor of 14 and large-scale parallel workloads showing slowdowns of up to a factor of 25. For problem-size scaling, ARM and x86-64 systems demonstrate comparable performance within 25%. These findings highlight the growing competitiveness of ARM-based processors and emphasize the importance of wide-register vectorization support and memory subsystem improvements for upcoming RISC-V platforms.

Is RISC-V Ready for Machine Learning? Portable Gaussian Processes Using Asynchronous Tasks

Abstract

Is RISC-V Ready for Machine Learning? Portable Gaussian Processes Using Asynchronous Tasks

Abstract

Paper Structure

Table of Contents

Figures (4)