ONNX-to-Hardware Design Flow for Adaptive Neural-Network Inference on FPGAs
Federico Manca, Francesco Ratto, Francesca Palumbo
TL;DR
The paper addresses edge CPS needs for diverse, energy-efficient NN inference on resource-limited FPGAs by proposing an ONNX-to-Hardware design flow that supports quantized CNNs via QONNX and introduces data- and computation-approximation for adaptivity. It combines a streaming CNN accelerator template with dataflow-driven HLS generation and a coarse-reconfigurability pipeline (MDC) to enable runtime profile switching. Quantization-aware training with mixed-precision profiles demonstrates trade-offs between accuracy and power, while runtime merging via MDC yields an adaptive inference engine capable of power reduction with minimal accuracy loss. The approach aims to enable flexible, adaptive edge inference for CPS, with plans to scale to more complex models and datasets under EU MYRTUS.
Abstract
The challenges involved in executing neural networks (NNs) at the edge include providing diversity, flexibility, and sustainability. That implies, for instance, supporting evolving applications and algorithms energy-efficiently. Using hardware or software accelerators can deliver fast and efficient computation of the NNs, while flexibility can be exploited to support long-term adaptivity. Nonetheless, handcrafting an NN for a specific device, despite the possibility of leading to an optimal solution, takes time and experience, and that's why frameworks for hardware accelerators are being developed. This work, starting from a preliminary semi-integrated ONNX-to-hardware toolchain [21], focuses on enabling approximate computing leveraging the distinctive ability of the original toolchain to favor adaptivity. The goal is to allow lightweight adaptable NN inference on FPGAs at the edge.
