Full-stack evaluation of Machine Learning inference workloads for RISC-V systems
Debjyoti Bhattacharjee, Anmol, Tommaso Marinelli, Karan Pathak, Peter Kourzanov
TL;DR
The paper addresses the challenge of evaluating ML inference workloads on RISC-V using architectural simulation. It employs the gem5 full-system simulator with an MLIR/IREE-based compilation flow to map diverse DL models to a baseline rv64gc processor and benchmarks a broad set of ML workloads. Key contributions include a reproducible cross-platform benchmarking framework, performance insights such as up to 5.22x speedups when switching from Minor to O3 cores, and identification of memory-bound characteristics and vector-instruction issues in early gem5. The work highlights current simulator limitations and outlines concrete future directions (MLPerf benchmarks, Zephyr POSIX layer, and FPGA-based validation) to advance RISC-V hardware-software co-design. This has practical impact for researchers and designers aiming to evaluate and refine ML inference workloads on RISC-V architectures.
Abstract
Architectural simulators hold a vital role in RISC-V research, providing a crucial platform for workload evaluation without the need for costly physical prototypes. They serve as a dynamic environment for exploring innovative architectural concepts, enabling swift iteration and thorough analysis of performance metrics. As deep learning algorithms become increasingly pervasive, it is essential to benchmark new architectures with machine learning workloads. The diverse computational kernels used in deep learning algorithms highlight the necessity for a comprehensive compilation toolchain to map to target hardware platforms. This study evaluates the performance of a wide array of machine learning workloads on RISC-V architectures using gem5, an open-source architectural simulator. Leveraging an open-source compilation toolchain based on Multi-Level Intermediate Representation (MLIR), the research presents benchmarking results specifically focused on deep learning inference workloads. Additionally, the study sheds light on current limitations of gem5 when simulating RISC-V architectures, offering insights for future development and refinement.
