Table of Contents
Fetching ...

Co-designing a Programmable RISC-V Accelerator for MPC-based Energy and Thermal Management of Many-Core HPC Processors

Alessandro Ottaviano, Andrino Meli, Paul Scheffler, Giovanni Bambini, Robert Balas, Davide Rossi, Andrea Bartolini, Luca Benini

TL;DR

The paper tackles real-time energy and thermal management for hundreds of processing elements in HPC processors by co-designing a lightweight MPC that runs on an embedded multi-core RISC-V controller. It combines an operator-splitting QP solver with aggressive model pruning and a novel scheduling framework to exploit sparsity, enabling sub-millisecond latency for 144 PEs at 500 MHz and substantial gains in energy efficiency. The approach uses offline precomputation of the sparse linear system, warm starting, and hardware loop extensions on ControlPULP Snitch to achieve up to $33\times$ speedup over a single-core baseline and $7.9\times$ energy efficiency improvements, all with memory footprint under 1 MiB and power under 325 mW. This work demonstrates that advanced predictive control for large-scale chip-level management can be effectively delegated to on-chip hardware, reducing reliance on power-hungry software stacks and external processors while preserving optimal performance and safety margins.

Abstract

Managing energy and thermal profiles is critical for many-core HPC processors with hundreds of application-class processing elements (PEs). Advanced model predictive control (MPC) delivers state-of-the-art performance but requires solving an online optimization problem over a thousand times per second (1 kHz control bandwidth), with computational and memory demands scaling with PE count. Traditional MPC approaches execute the controller on the PEs, but operating system overheads create jitter and limit control bandwidth. Running MPC on dedicated on-chip controllers enables fast, deterministic control but raises concerns about area and power overhead. In this work, we tackle these challenges by proposing a hardware-software codesign of a lightweight MPC controller, based on an operator-splitting quadratic programming solver and an embedded multi-core RISC-V controller. Key innovations include pruning weak thermal couplings to reduce model memory and ahead-of-time scheduling for efficient parallel execution of sparse triangular systems arising from the optimization problem. The proposed controller achieves sub-millisecond latency when controlling 144 PEs at 500 MHz, delivering 33x lower latency and 7.9x higher energy efficiency than a single-core baseline. Operating within a compact less than 1 MiB memory footprint, it consumes as little as 325 mW while occupying less than 1.5% of a typical HPC processor's die area.

Co-designing a Programmable RISC-V Accelerator for MPC-based Energy and Thermal Management of Many-Core HPC Processors

TL;DR

The paper tackles real-time energy and thermal management for hundreds of processing elements in HPC processors by co-designing a lightweight MPC that runs on an embedded multi-core RISC-V controller. It combines an operator-splitting QP solver with aggressive model pruning and a novel scheduling framework to exploit sparsity, enabling sub-millisecond latency for 144 PEs at 500 MHz and substantial gains in energy efficiency. The approach uses offline precomputation of the sparse linear system, warm starting, and hardware loop extensions on ControlPULP Snitch to achieve up to speedup over a single-core baseline and energy efficiency improvements, all with memory footprint under 1 MiB and power under 325 mW. This work demonstrates that advanced predictive control for large-scale chip-level management can be effectively delegated to on-chip hardware, reducing reliance on power-hungry software stacks and external processors while preserving optimal performance and safety margins.

Abstract

Managing energy and thermal profiles is critical for many-core HPC processors with hundreds of application-class processing elements (PEs). Advanced model predictive control (MPC) delivers state-of-the-art performance but requires solving an online optimization problem over a thousand times per second (1 kHz control bandwidth), with computational and memory demands scaling with PE count. Traditional MPC approaches execute the controller on the PEs, but operating system overheads create jitter and limit control bandwidth. Running MPC on dedicated on-chip controllers enables fast, deterministic control but raises concerns about area and power overhead. In this work, we tackle these challenges by proposing a hardware-software codesign of a lightweight MPC controller, based on an operator-splitting quadratic programming solver and an embedded multi-core RISC-V controller. Key innovations include pruning weak thermal couplings to reduce model memory and ahead-of-time scheduling for efficient parallel execution of sparse triangular systems arising from the optimization problem. The proposed controller achieves sub-millisecond latency when controlling 144 PEs at 500 MHz, delivering 33x lower latency and 7.9x higher energy efficiency than a single-core baseline. Operating within a compact less than 1 MiB memory footprint, it consumes as little as 325 mW while occupying less than 1.5% of a typical HPC processor's die area.

Paper Structure

This paper contains 52 sections, 16 equations, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: Illustration of (a) and (b) centric paradigms for of chips. Blocks in red indicate the hardware/software entity responsible for low-level interaction with sensors and actuators (in purple) during .
  • Figure 2: ControlPULP microarchitecture. The Snitch and core complex with and frep extensions are highlighted.
  • Figure 3: Thermal components with heat propagation flow of an chiplet (left) and lumped parameter circuit of two finite elements from the silicon die and copper heat spreader, respectively (right).
  • Figure 4: Architectures of controller (right) and plant (left). Both are modeled in MATLAB; the controller architecture is progressively refined using .
  • Figure 5: Problem size (\ref{['eq:num_tot_nnz']}) of vanilla and pruned for at a varying number of controlled and fixed horizon $H_p$=2. uses a cutoff of $0.005$. For comparison, the figure reports the problem size of other problems, reproduced from FerrauABB_1.
  • ...and 11 more figures