fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs
Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras
TL;DR
The paper addresses the challenge of deploying throughput‑intensive 3D CNNs for Human Action Recognition on resource‑constrained FPGA devices. It introduces fpgaHART, a toolflow that leverages Synchronous Dataflow Graphs (SDFG) with branching support to model and map modern 3D CNN HAR architectures onto FPGAs, including partitioning across multiple bitstreams to fit device resources. A design space exploration framework, driven by simulated annealing, jointly optimizes per‑layer parallelism, partitioning, and memory bandwidth considerations, guided by a performance model that combines a workload matrix with the SDFG topology. Evaluation across diverse HAR models (e.g., C3D, SlowOnly, R(2+1)D, X3D) and FPGA platforms demonstrates competitive throughput and energy efficiency, establishing fpgaHART as a scalable path for edge HAR deployments on reconfigurable hardware.
Abstract
Surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval are just few of the many applications in which 3D Convolutional Neural Networks are exploited. However, their extensive use is restricted by their high computational and memory requirements, especially when integrated into systems with limited resources. This study proposes a toolflow that optimises the mapping of 3D CNN models for Human Action Recognition onto FPGA devices, taking into account FPGA resources and off-chip memory characteristics. The proposed system employs Synchronous Dataflow (SDF) graphs to model the designs and introduces transformations to expand and explore the design space, resulting in high-throughput designs. A variety of 3D CNN models were evaluated using the proposed toolflow on multiple FPGA devices, demonstrating its potential to deliver competitive performance compared to earlier hand-tuned and model-specific designs.
