Table of Contents
Fetching ...

Hardware-oriented Approximation of Convolutional Neural Networks

Philipp Gysel, Mohammad Motamedi, Soheil Ghiasi

TL;DR

This work addresses the high compute and memory demands of CNNs on hardware with a hardware-oriented, post-training quantization framework called Ristretto. It leverages mixed and dynamic fixed-point representations, a five-stage quantization flow, and shadow-weight fine-tuning to produce fixed-point CNNs that retain accuracy while drastically reducing memory and arithmetic requirements. The results show that large networks like CaffeNet and SqueezeNet can be quantized to 8-bit with minimal accuracy loss (often below 1-2%), enabling efficient hardware accelerators and on-chip storage. The approach yields significant practical impact for mobile and embedded inference, with open-source tooling to facilitate adoption and further optimization including pruning and binarization extensions.

Abstract

High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers. Ristretto can condense models by using fixed point arithmetic and representation instead of floating point. Moreover, Ristretto fine-tunes the resulting fixed point network. Given a maximum error tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

Hardware-oriented Approximation of Convolutional Neural Networks

TL;DR

This work addresses the high compute and memory demands of CNNs on hardware with a hardware-oriented, post-training quantization framework called Ristretto. It leverages mixed and dynamic fixed-point representations, a five-stage quantization flow, and shadow-weight fine-tuning to produce fixed-point CNNs that retain accuracy while drastically reducing memory and arithmetic requirements. The results show that large networks like CaffeNet and SqueezeNet can be quantized to 8-bit with minimal accuracy loss (often below 1-2%), enabling efficient hardware accelerators and on-chip storage. The approach yields significant practical impact for mobile and embedded inference, with open-source tooling to facilitate adoption and further optimization including pruning and binarization extensions.

Abstract

High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator development is hardware-oriented model approximation. In this paper we present Ristretto, a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers. Ristretto can condense models by using fixed point arithmetic and representation instead of floating point. Moreover, Ristretto fine-tunes the resulting fixed point network. Given a maximum error tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

Paper Structure

This paper contains 6 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Data path of quantized convolutional and fully connected layers.
  • Figure 2: Network approximation flow with Ristretto.
  • Figure 3: Fine-tuning with shadow weights. The left side shows the training process with full-precision shadow weights. On the right side the fine-tuned network is benchmarked on the validation data set. Fixed point values are represented in orange.
  • Figure 4: Impact of dynamic fixed point: The figure shows top-1 accuracy for CaffeNet on ILSVRC 2014 validation dataset. Integer length refers to the number of bits assigned to the integer part of fixed point numbers.