Table of Contents
Fetching ...

SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS

Rui Shi, Seda Ogrenci

TL;DR

SPRING proposes a dynamic on-FPGA profiling framework for HLS-based streaming accelerators on SoC FPGAs that streams profiling data alongside the main data to monitor runtime behavior, notably the $FIFO$ fullness. It integrates with hls4ml to automate the generation and profiling of Randomly Interconnected Neural Networks (RINNs) and validates the approach against co-simulation, revealing both resource overhead and actionable patterns in FIFO sizing. The key contributions include the ability to profile over 200 internal signals at the HLS level, an end-to-end automated profiling pipeline, and design guidance for initial FIFO sizing across diverse RINN configurations. The framework improves real-time observability for edge ML accelerators and paves the way for extending profiling to additional metrics such as latency and runtime states while highlighting practical optimization opportunities to reduce interference and overhead.

Abstract

Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution. Existing profiling tools for High-Level Synthesis (HLS) IPs running on FPGAs are far less mature compared with those developed for fixed CPU and GPU architectures and they still lag behind mainly due to their dynamic architecture. This limitation is reflected in the typical approach of extracting monitoring signals off of an FPGA device individually from dedicated ports, using one BRAM per signal for temporary information storage, or embedding vendor specific primitives to manually analyze the waveform. In this paper, we propose a systematic profiling method tailored to the dynamic nature of FPGA systems, particularly suitable for streaming accelerators. Instead of relying on signal extraction, the proposed profiling stream flows alongside the actual data, dynamically splitting and merging in synchrony with the data stream, and is ultimately directed to the processing system (PS) side. We conducted a preliminary evaluation of this method on randomly interconnected neural networks (RINNs) using the FIFO fullness metric, with co-simulation results for validation.

SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS

TL;DR

SPRING proposes a dynamic on-FPGA profiling framework for HLS-based streaming accelerators on SoC FPGAs that streams profiling data alongside the main data to monitor runtime behavior, notably the fullness. It integrates with hls4ml to automate the generation and profiling of Randomly Interconnected Neural Networks (RINNs) and validates the approach against co-simulation, revealing both resource overhead and actionable patterns in FIFO sizing. The key contributions include the ability to profile over 200 internal signals at the HLS level, an end-to-end automated profiling pipeline, and design guidance for initial FIFO sizing across diverse RINN configurations. The framework improves real-time observability for edge ML accelerators and paves the way for extending profiling to additional metrics such as latency and runtime states while highlighting practical optimization opportunities to reduce interference and overhead.

Abstract

Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution. Existing profiling tools for High-Level Synthesis (HLS) IPs running on FPGAs are far less mature compared with those developed for fixed CPU and GPU architectures and they still lag behind mainly due to their dynamic architecture. This limitation is reflected in the typical approach of extracting monitoring signals off of an FPGA device individually from dedicated ports, using one BRAM per signal for temporary information storage, or embedding vendor specific primitives to manually analyze the waveform. In this paper, we propose a systematic profiling method tailored to the dynamic nature of FPGA systems, particularly suitable for streaming accelerators. Instead of relying on signal extraction, the proposed profiling stream flows alongside the actual data, dynamically splitting and merging in synchrony with the data stream, and is ultimately directed to the processing system (PS) side. We conducted a preliminary evaluation of this method on randomly interconnected neural networks (RINNs) using the FIFO fullness metric, with co-simulation results for validation.

Paper Structure

This paper contains 18 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Profiling Flow
  • Figure 2: Comparison of the typical profiling structure and our proposed profiling structure.
  • Figure 3: ZCU102, Conv2D stacking. Data is collected from the Vivado implementation report, with the overhead computed by subtracting the resource requirement of the original (non-profiled) version from the design with profiling structures, and then averaged by the number of concurrently profiled signals.
  • Figure 4: ZCU102, Conv2D stacking. A total of 79 signals are profiled in this RINN for each precision. Data is collected from the Vivado implementation report, with the overhead computed by subtracting the resource requirement of the original (non-profiled) version from the design with profiling structures, and then averaged by the number of concurrently profiled signals. For this FIFO size scenario, bitwidths less than 6 will lead to overflow.
  • Figure 5: Complexity Influence on FIFO Fullness.