Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

Soumita Chatterjee; Sudip Ghosh; Tamal Ghosh; Hafizur Rahaman

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

Soumita Chatterjee, Sudip Ghosh, Tamal Ghosh, Hafizur Rahaman

TL;DR

Various hardware level optimizations for DL include techniques such as loop pipelining, parallelism, quantization, and various memory hierarchy enhancements, as well as an overview of state-of-the-art FPGA-based neural network accelerators.

Abstract

Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-making with high levels of accuracy. However, as these technologies evolve and strive to meet the growing demands of real-life applications, the complexity of DL models continues to increase. These models require processing of massive volumes of data, demanding substantial computational power and memory bandwidth. This gives rise to the critical need for hardware accelerators that can deliver both high performance and energy efficiency. Accelerator types include ASIC based solutions, GPU accelerators, and FPGA based implementations. The limitations of ASIC and GPU accelerators have led to FPGAs becoming one of the prominent solutions, offering distinct advantages for DL workloads. FPGAs provide a flexible and reconfigurable platform, allowing model specific customization while maintaining high efficiency. This article explores various hardware level optimizations for DL. These optimizations include techniques such as loop pipelining, parallelism, quantization, and various memory hierarchy enhancements. In addition, it provides an overview of state-of-the-art FPGA-based neural network accelerators. Through the study and analysis of these accelerators, several challenges have been identified, paving the way for future optimizations and innovations in the design of FPGA-based hardware accelerators.

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

TL;DR

Abstract

Paper Structure (20 sections, 14 figures, 5 tables)

This paper contains 20 sections, 14 figures, 5 tables.

Introduction
Background
Graphics Processing Units (GPU)
ASIC based Accelerator
Neural Processing Units (NPUs)
Tensor Processing Units (TPUs)
FPGA based Accelerator
Architecture of FPGA and Basic Structure of FPGA as Hardware Accelerators
Model-Specific Design Approaches for FPGA based AI Accelerators
Convolutional Neural Network (CNN)
Spiking Neural Network (SNN)
Recurrent Neural Network (RNN)
Graph Neural Network (GNN)
Hardware-Level Optimization Strategies
Computation Level Optimizations
...and 5 more sections

Figures (14)

Figure 1: Performance Comparison Metrics Across CPU, GPU, ASIC and FPGA.
Figure 2: Key features, existing optimization techniques, limitations & need for further optimization of FPGA-based hardware accelerators.
Figure 3: Architecture of AccUDNN showing the process flow between the memory optimizer and hyperparameter tuner modules. 8988598
Figure 4: Architecture of an analog in-memory computing AIMC core with integrated crossbar arrays & memory-based unit cells. 10.1063/5.0168089
Figure 5: Architecture of TPU v1 showing dataflow between the systolic array, unified buffer & high-bandwidth memory interface.8358031
...and 9 more figures

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

TL;DR

Abstract

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

Authors

TL;DR

Abstract

Table of Contents

Figures (14)