Table of Contents
Fetching ...

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

Anagha Nimbekar, Prabodh Katti, Chen Li, Bashir M. Al-Hashimi, Amit Acharyya, Bipin Rajendran

TL;DR

A hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator.

Abstract

Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

TL;DR

A hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator.

Abstract

Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.

Paper Structure

This paper contains 12 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: ANN-to-SNN conversion strategy for developing reduced precision SNN models for fast inference in hardware.
  • Figure 2: High-level block diagram of the proposed Spiking Inference Accelerator (SIA). The spiking core consists of 64 PE elements to execute spiking convolution while the aggregation core executes neuronal computation.
  • Figure 3: Ping-pong protocol for membrane potential storage (green: read, red: write)
  • Figure 4: High-level block diagram of the FPGA Implementation of our spiking inference accelerator (SIA).
  • Figure 5: Implementation flow to configure the proposed SIA architecture on FPGA.
  • ...and 4 more figures