Table of Contents
Fetching ...

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

TL;DR

This paper tackles the computational burden of convolutional neural networks by leveraging binarized networks on FPGAs. It introduces FINN, a parameterizable, heterogeneous streaming framework that maps trained binarized models to hardware using specialized blocks and optimizations (popcount, thresholded BN, OR pooling) to maximize throughput while keeping parameters on-chip. The design flow generates FPGA accelerators from Theano-trained networks and demonstrates up to 12.3 million classifications per second with sub-microsecond latency on MNIST, plus strong results on CIFAR-10 and SVHN, with favorable energy and resource efficiency. These results show that real-time embedded vision with binarized networks is practical on commodity FPGA platforms, enabling responsive edge AI applications.

Abstract

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

TL;DR

This paper tackles the computational burden of convolutional neural networks by leveraging binarized networks on FPGAs. It introduces FINN, a parameterizable, heterogeneous streaming framework that maps trained binarized models to hardware using specialized blocks and optimizations (popcount, thresholded BN, OR pooling) to maximize throughput while keeping parameters on-chip. The design flow generates FPGA accelerators from Theano-trained networks and demonstrates up to 12.3 million classifications per second with sub-microsecond latency on MNIST, plus strong results on CIFAR-10 and SVHN, with favorable energy and resource efficiency. These results show that real-time embedded vision with binarized networks is practical on commodity FPGA platforms, enabling responsive edge AI applications.

Abstract

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

Paper Structure

This paper contains 29 sections, 2 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Roofline model for a ZU19EG.
  • Figure 2: Heterogeneous streaming.
  • Figure 3: Three examples of binary neuron activations with batch normalization. A slight vertical offset is added for clarity.
  • Figure 4: Generating an FPGA accelerator from a trained .
  • Figure 5: Overview of the MVTU.
  • ...and 6 more figures