fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

Petros Toupas; Christos-Savvas Bouganis; Dimitrios Tzovaras

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras

TL;DR

The paper addresses the challenge of deploying throughput‑intensive 3D CNNs for Human Action Recognition on resource‑constrained FPGA devices. It introduces fpgaHART, a toolflow that leverages Synchronous Dataflow Graphs (SDFG) with branching support to model and map modern 3D CNN HAR architectures onto FPGAs, including partitioning across multiple bitstreams to fit device resources. A design space exploration framework, driven by simulated annealing, jointly optimizes per‑layer parallelism, partitioning, and memory bandwidth considerations, guided by a performance model that combines a workload matrix with the SDFG topology. Evaluation across diverse HAR models (e.g., C3D, SlowOnly, R(2+1)D, X3D) and FPGA platforms demonstrates competitive throughput and energy efficiency, establishing fpgaHART as a scalable path for edge HAR deployments on reconfigurable hardware.

Abstract

Surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval are just few of the many applications in which 3D Convolutional Neural Networks are exploited. However, their extensive use is restricted by their high computational and memory requirements, especially when integrated into systems with limited resources. This study proposes a toolflow that optimises the mapping of 3D CNN models for Human Action Recognition onto FPGA devices, taking into account FPGA resources and off-chip memory characteristics. The proposed system employs Synchronous Dataflow (SDF) graphs to model the designs and introduces transformations to expand and explore the design space, resulting in high-throughput designs. A variety of 3D CNN models were evaluated using the proposed toolflow on multiple FPGA devices, demonstrating its potential to deliver competitive performance compared to earlier hand-tuned and model-specific designs.

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

TL;DR

Abstract

Paper Structure (14 sections, 5 equations, 3 figures, 4 tables)

This paper contains 14 sections, 5 equations, 3 figures, 4 tables.

Introduction
Background
Hardware-Level Interpretation
3D CNN layers as DAG nodes
SDFG representation with branch support
3D CNN layers as hardware building blocks
Design Space Exploration
3D CNN Model Partitioning
Partition-Specific Optimisations
Performance Modelling
Evaluation
Modeling Accuracy Evaluation
Performance Comparison
Conclusion

Figures (3)

Figure 1: Kinetics-400 pareto is dominated by 3D-CNNs for small number of parameters. Demonstrating the deployability of 3D-CNNs on edge devices with limited resources.
Figure 2: Throughput (GOPs/s) of fpgaHART-generated designs on 3D CNN HAR models delivering high-throughput results on a variety of FPGA devices
Figure 3: Pareto front on 3D CNNs: Clips/s over Accuracy. The fpgaHART results were taken using the VC709 FPGA platform, delivering solutions on the Pareto front.

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

TL;DR

Abstract

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

Authors

TL;DR

Abstract

Table of Contents

Figures (3)