Blink: Fast Automated Design of Run-Time Power Monitors on FPGA-Based Computing Platforms
Andrea Galimberti, Michele Piccoli, Davide Zoni
TL;DR
Blink addresses the time-to-solution bottleneck in run-time power monitor design for FPGA-based heterogeneous platforms by replacing gate-level simulations and post-route power-trace extraction with fast behavioral simulations and direct FPGA power measurements. The framework aligns measured power traces with RTL-generated activity and uses measured data for power-model identification, enabling scalable deployment across accelerator designs. Experimental results show overheads under 3% with RMSE below 5%, and an average time-to-solution speedup of 18× compared to the state-of-the-art, lowering the design cycle from days to roughly a day. This approach enables rapid, accurate, and scalable run-time power monitoring on complex FPGA-based systems, supporting energy-aware optimization in real time.
Abstract
The current over-provisioned heterogeneous multi-cores require effective run-time optimization strategies, and the run-time power monitoring subsystem is paramount for their success. Several state-of-the-art methodologies address the design of a run-time power monitoring infrastructure for generic computing platforms. However, the power model's training requires time-consuming gate-level simulations that, coupled with the ever-increasing complexity of the modern heterogeneous platforms, dramatically hinder the usability of such solutions. This paper introduces Blink, a scalable framework for the fast and automated design of run-time power monitoring infrastructures targeting computing platforms implemented on FPGA. Blink optimizes the time-to-solution to deliver the run-time power monitoring infrastructure by replacing traditional methodologies' gate-level simulations and power trace computations with behavioral simulations and direct power trace measurements. Applying Blink to multiple designs mixing a set of HLS-generated accelerators from a state-of-the-art benchmark suite demonstrates an average time-to-solution speedup of 18 times without affecting the quality of the run-time power estimates.
