Table of Contents
Fetching ...

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

TL;DR

This survey analyzes CNN-to-FPGA toolflows, comparing their input interfaces, hardware architectures, design-space exploration methods, arithmetic precision, and performance outcomes. It highlights two dominant architectural paradigms (streaming per-layer pipelines vs. single-engine designs) and shows how RTL-based, analytically guided design flows often yield higher QoR than brute-force or purely HLS approaches. The authors argue for a standardized evaluation methodology and a benchmark suite to enable fair cross-toolflow comparisons, and they map future directions including support for next-gen CNNs, compressed networks, low-precision training, and hardware-software co-design. The work aims to catalyze broader FPGA adoption in deep learning by clarifying current capabilities, limitations, and strategic research avenues with practical impact for embedded and data-center deployments.

Abstract

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

TL;DR

This survey analyzes CNN-to-FPGA toolflows, comparing their input interfaces, hardware architectures, design-space exploration methods, arithmetic precision, and performance outcomes. It highlights two dominant architectural paradigms (streaming per-layer pipelines vs. single-engine designs) and shows how RTL-based, analytically guided design flows often yield higher QoR than brute-force or purely HLS approaches. The authors argue for a standardized evaluation methodology and a benchmark suite to enable fair cross-toolflow comparisons, and they map future directions including support for next-gen CNNs, compressed networks, low-precision training, and hardware-software co-design. The work aims to catalyze broader FPGA adoption in deep learning by clarifying current capabilities, limitations, and strategic research avenues with practical impact for embedded and data-center deployments.

Abstract

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.

Paper Structure

This paper contains 18 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Example of a streaming accelerator architecture
  • Figure 2: Example of a single computation engine accelerator
  • Figure 3: Comparison on mapping LeNet-5, CIFAR-10 and GoogLeNet on Zynq XC7Z045
  • Figure 4: Comparison targeting Zynq platforms
  • Figure 5: DSP-normalised comparison on mapping AlexNet, VGG16 and ResNet-152 on Stratix V
  • ...and 3 more figures