SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS
Rui Shi, Seda Ogrenci
TL;DR
SPRING proposes a dynamic on-FPGA profiling framework for HLS-based streaming accelerators on SoC FPGAs that streams profiling data alongside the main data to monitor runtime behavior, notably the $FIFO$ fullness. It integrates with hls4ml to automate the generation and profiling of Randomly Interconnected Neural Networks (RINNs) and validates the approach against co-simulation, revealing both resource overhead and actionable patterns in FIFO sizing. The key contributions include the ability to profile over 200 internal signals at the HLS level, an end-to-end automated profiling pipeline, and design guidance for initial FIFO sizing across diverse RINN configurations. The framework improves real-time observability for edge ML accelerators and paves the way for extending profiling to additional metrics such as latency and runtime states while highlighting practical optimization opportunities to reduce interference and overhead.
Abstract
Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution. Existing profiling tools for High-Level Synthesis (HLS) IPs running on FPGAs are far less mature compared with those developed for fixed CPU and GPU architectures and they still lag behind mainly due to their dynamic architecture. This limitation is reflected in the typical approach of extracting monitoring signals off of an FPGA device individually from dedicated ports, using one BRAM per signal for temporary information storage, or embedding vendor specific primitives to manually analyze the waveform. In this paper, we propose a systematic profiling method tailored to the dynamic nature of FPGA systems, particularly suitable for streaming accelerators. Instead of relying on signal extraction, the proposed profiling stream flows alongside the actual data, dynamically splitting and merging in synchrony with the data stream, and is ultimately directed to the processing system (PS) side. We conducted a preliminary evaluation of this method on randomly interconnected neural networks (RINNs) using the FIFO fullness metric, with co-simulation results for validation.
