FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

Petros Toupas; Christos-Savvas Bouganis; Dimitrios Tzovaras

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras

TL;DR

The proposed toolflow generates an optimised stream-based hardware system, taking into account the available resources and off-chip memory characteristics of the FPGA device, and enables for the first time the targeting of such complex model architectures for the Human Action Recognition task.

Abstract

3D Convolutional Neural Networks are gaining increasing attention from researchers and practitioners and have found applications in many domains, such as surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval. However, their widespread adoption is hindered by their high computational and memory requirements, especially when resource-constrained systems are targeted. This paper addresses the problem of mapping X3D, a state-of-the-art model in Human Action Recognition that achieves accuracy of 95.5\% in the UCF101 benchmark, onto any FPGA device. The proposed toolflow generates an optimised stream-based hardware system, taking into account the available resources and off-chip memory characteristics of the FPGA device. The generated designs push further the current performance-accuracy pareto front, and enable for the first time the targeting of such complex model architectures for the Human Action Recognition task.

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 6 figures, 2 tables)

This paper contains 18 sections, 5 equations, 6 figures, 2 tables.

Introduction
Background
fpgaConvNet
Related Work
X3D Model Family
Hardware-Level Interpretation of X3D
X3D layers as DAG nodes
SDFG representation with branch support
X3D layers as hardware building blocks
Streaming-Centric Optimizations
Design Space Exploration
X3D Model Partitioning
DSE Within X3D Partitions
Performance Modeling
Evaluation
...and 3 more sections

Figures (6)

Figure 1: Kinetics-400 pareto over the years
Figure 2: Model performance over different FPGA devices and resource constraints
Figure 3: Quantization results on UCF101 over different word lengths. $W\_X$ denotes the word length of the fixed point representation of the weights, while $FM\_X$ for the feature maps
Figure 4: X3D model main partitions types: (a) Type 1 (b) Type 2 (c) Type 3
Figure 5: Latency (ms) MAPE between modeling performance and co-simulation
...and 1 more figures

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

TL;DR

Abstract

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (6)