Table of Contents
Fetching ...

RealProbe: An Automated and Lightweight Performance Profiler for In-FPGA Execution of High-Level Synthesis Designs

Jiho Kim, Cong Hao

TL;DR

RealProbe addresses the problem of inaccurate performance profiling for HLS-based FPGA designs by providing a fully automated, non-intrusive in-FPGA profiler that integrates with Vitis HLS and Vivado. It automatically maps C++ constructs to RTL signals, externalizes control signals, and logs precise cycle counts with minimal on-chip overhead, offloading data to DRAM as needed. Through incremental synthesis and automated design space exploration, RealProbe balances resource usage, DRAM bandwidth, and maximum frequency to create Pareto-optimal profiling configurations, validated across 28 designs and two FPGA platforms with 100% cycle-count accuracy relative to ILA. The approach enables detailed, scalable bottleneck analysis and visualization that aligns closely with actual hardware execution, improving profiling reliability and developer productivity.

Abstract

High-level synthesis (HLS) accelerates FPGA design by rapidly generating diverse implementations using optimization directives. However, even with cycle-accurate C/RTL co-simulation, the reported clock cycles often differ significantly from actual FPGA performance. This discrepancy hampers accurate bottleneck identification, leading to suboptimal design choices. Existing in-FPGA profiling tools, such as the Integrated Logic Analyzer (ILA), require tedious inspection of HLS-generated RTL and manual signal monitoring, reducing productivity. To address these challenges, we introduce RealProbe, the first fully automated, lightweight in-FPGA profiling tool for HLS designs. With a single directive--#pragma HLS RealProbe--the tool automatically generates all necessary code to profile cycle counts across the full function hierarchy, including submodules and loops. RealProbe extracts, records, and visualizes cycle counts with high precision, providing actionable insights into on-board performance. RealProbe is non-intrusive, implemented as independent logic to ensure minimal impact on kernel functionality or timing. It also supports automated design space exploration (DSE), optimizing resource allocation based on FPGA constraints and module complexity. By leveraging incremental synthesis and implementation, DSE runs independently of the original HLS kernel. Evaluated across 28 diverse test cases, including a large-scale design, RealProbe achieves 100% accuracy in capturing cycle counts with minimal logic overhead-just 16.98% LUTs, 43.15% FFs, and 0% BRAM usage. The tool, with full documentation and examples, is available on GitHub at https://github.com/sharc-lab/RealProbe .

RealProbe: An Automated and Lightweight Performance Profiler for In-FPGA Execution of High-Level Synthesis Designs

TL;DR

RealProbe addresses the problem of inaccurate performance profiling for HLS-based FPGA designs by providing a fully automated, non-intrusive in-FPGA profiler that integrates with Vitis HLS and Vivado. It automatically maps C++ constructs to RTL signals, externalizes control signals, and logs precise cycle counts with minimal on-chip overhead, offloading data to DRAM as needed. Through incremental synthesis and automated design space exploration, RealProbe balances resource usage, DRAM bandwidth, and maximum frequency to create Pareto-optimal profiling configurations, validated across 28 designs and two FPGA platforms with 100% cycle-count accuracy relative to ILA. The approach enables detailed, scalable bottleneck analysis and visualization that aligns closely with actual hardware execution, improving profiling reliability and developer productivity.

Abstract

High-level synthesis (HLS) accelerates FPGA design by rapidly generating diverse implementations using optimization directives. However, even with cycle-accurate C/RTL co-simulation, the reported clock cycles often differ significantly from actual FPGA performance. This discrepancy hampers accurate bottleneck identification, leading to suboptimal design choices. Existing in-FPGA profiling tools, such as the Integrated Logic Analyzer (ILA), require tedious inspection of HLS-generated RTL and manual signal monitoring, reducing productivity. To address these challenges, we introduce RealProbe, the first fully automated, lightweight in-FPGA profiling tool for HLS designs. With a single directive--#pragma HLS RealProbe--the tool automatically generates all necessary code to profile cycle counts across the full function hierarchy, including submodules and loops. RealProbe extracts, records, and visualizes cycle counts with high precision, providing actionable insights into on-board performance. RealProbe is non-intrusive, implemented as independent logic to ensure minimal impact on kernel functionality or timing. It also supports automated design space exploration (DSE), optimizing resource allocation based on FPGA constraints and module complexity. By leveraging incremental synthesis and implementation, DSE runs independently of the original HLS kernel. Evaluated across 28 diverse test cases, including a large-scale design, RealProbe achieves 100% accuracy in capturing cycle counts with minimal logic overhead-just 16.98% LUTs, 43.15% FFs, and 0% BRAM usage. The tool, with full documentation and examples, is available on GitHub at https://github.com/sharc-lab/RealProbe .

Paper Structure

This paper contains 21 sections, 1 equation, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Cycle count discrepancies among HLS C-synth estimates, C/RTL co-simulation, and on-board execution for two HLS designs on the Pynq-Z2 FPGA. Left: array accumulation; right: matrix multiplication.
  • Figure 2: Previous intrusive profiling tools with highlighted instrumentation code. (a) HLScope choi2017hlscope requires manually inserting FIFOs and dataflow pragma. (b) Bensalem et al. bensalem2020opencl requires fine-grained instrumentation.
  • Figure 3: End-to-end automated RealProbe integrated with Vitis HLS and Vivado. The only user input for profiling is #pragma HLS RealProbe.
  • Figure 4: An example of using RealProbe pragma in an non-intrusive fashion, with the profiling results for function and loop hierarchy.
  • Figure 5: RealProbe’s modified LLVM flow accurately maps RTL modules to C functions, while the original flow fails (marked 'X').
  • ...and 9 more figures