Table of Contents
Fetching ...

Convolutions Predictable Offloading to an Accelerator: Formalization and Optimization

Benjamin Husson, Mohammed Belcaïd, Thomas Carle, Claire Pagetti

Abstract

Convolutional neural networks (CNNs) require a large number of multiply-accumulate (MAC) operations. To meet real-time constraints, they often need to be executed on specialized accelerators composed of an on-chip memory and a processing unit. However, the on-chip memory is often insufficient to store all the data required to compute a CNN layer. Thus, the computation must be performed in several offloading steps. We formalise such sequences of steps and apply our formalism to a state of the art decomposition of convolutions. In order to find optimal strategies in terms of duration, we encode the problem with a set of constraints. A Python-based simulator allows to analyse in-depth computed strategies.

Convolutions Predictable Offloading to an Accelerator: Formalization and Optimization

Abstract

Convolutional neural networks (CNNs) require a large number of multiply-accumulate (MAC) operations. To meet real-time constraints, they often need to be executed on specialized accelerators composed of an on-chip memory and a processing unit. However, the on-chip memory is often insufficient to store all the data required to compute a CNN layer. Thus, the computation must be performed in several offloading steps. We formalise such sequences of steps and apply our formalism to a state of the art decomposition of convolutions. In order to find optimal strategies in terms of duration, we encode the problem with a set of constraints. A Python-based simulator allows to analyse in-depth computed strategies.
Paper Structure (34 sections, 22 equations, 13 figures, 1 table)

This paper contains 34 sections, 22 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Generic accelerator architecture
  • Figure 2: A step = sequence of execution
  • Figure 3: Multi-core with local SPM (e.g. AURIX)
  • Figure 4: Eyeriss architecture
  • Figure 5: TMMA architecture
  • ...and 8 more figures

Theorems & Definitions (25)

  • Definition 1: n-step computation
  • Definition 2: Semantics of a n-step computation
  • Definition 3: Duration of an n-step strategy
  • Definition 4: Tensor
  • Definition 5: 2D convolution operation
  • Remark 1
  • Definition 6: 3D-Input tensor
  • Definition 7: Kernels
  • Definition 8: 3D-Output tensor
  • Remark 2
  • ...and 15 more