When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Abhishek Moitra; Abhiroop Bhattacharjee; Yuhang Li; Youngeun Kim; Priyadarshini Panda

When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Abhishek Moitra, Abhiroop Bhattacharjee, Yuhang Li, Youngeun Kim, Priyadarshini Panda

Abstract

This review explores the intersection of bio-plausible artificial intelligence in the form of Spiking Neural Networks (SNNs) with the analog In-Memory Computing (IMC) domain, highlighting their collective potential for low-power edge computing environments. Through detailed investigation at the device, circuit, and system levels, we highlight the pivotal synergies between SNNs and IMC architectures. Additionally, we emphasize the critical need for comprehensive system-level analyses, considering the inter-dependencies between algorithms, devices, circuit & system parameters, crucial for optimal performance. An in-depth analysis leads to identification of key system-level bottlenecks arising from device limitations which can be addressed using SNN-specific algorithm-hardware co-design techniques. This review underscores the imperative for holistic device to system design space co-exploration, highlighting the critical aspects of hardware and algorithm research endeavors for low-power neuromorphic solutions.

When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Abstract

Paper Structure (25 sections, 7 equations, 13 figures, 4 tables)

This paper contains 25 sections, 7 equations, 13 figures, 4 tables.

Introduction
SNN Algorithm and Application Space
Inherent Efficiencies in SNNs
State-of-the-art SNN Training Algorithms
Conventional Learning Algorithms
Back Propagation Through Time
Application Space for SNNs
IMC Accelerators for SNNs
von-Neumann and IMC Accelerators
Standard Hardware Evaluation Metrics
Synergies between IMC Accelerators and SNNs
System-level Analyses of IMC-SNN
IMC Hardware Evaluation Platform
Need for System-level Analyses of IMC-SNN
SNN System-level Bottlenecks and Mitigation Strategies
...and 10 more sections

Figures (13)

Figure 1: Landscape of the Spiking Neural Network (SNN) algorithm, In-memory computing (IMC) device, circuit and system parameters. In order to reach fully optimal SNN-IMC implementations, there is a critical need to consider the existing co-dependencies between IMC device, circuit, system and SNN algorithm parameters. LIF denotes leaky-integrate and fire neuron, a fundamental non-linear activation unit in SNNs. Relevant parameters for each domain that underlie hardware metrics such as, performance, latency, energy efficiency, area and power are mentioned.
Figure 2: Figure showing the functioning of an SNN. Input spikes are sent to the SNN across multiple timesteps. These binary spikes get multiplied with SNN weights ($w$) to generate Multiply-and-Accumulate (MAC) values which charge the membrane potential value $U$ over multiple timesteps. At any timestep, if the membrane potential exceeds a pre-defined threshold ($\theta$), a spike output is generated.
Figure 3: (a) Training computation graph for the BPTT algorithm. The gradients pass through different layers and timesteps. (b) SNN accuracy on image classification tasks over the years. We show accuracy on two widely used benchmarks: CIFAR10 cifar10 and ImageNet deng2009imagenet, that comprise of 50000, 1.2 Million images with 10 classes and 1000 classes, respectively. SNNs trained on BPTT can achieve high accuracy while scaling to large-scale datasets at low timesteps.
Figure 4: Figure showing (a) von-Neumann accelerators containing on-chip cache, scratch pad memories, multipliers and accumulators for performing MAC operations. (b) IMC architectures containing 2D arrays of 1 transistor-1 memsistor (1T-1R) devices. They perform fast analog dot-products minimizing data transfer to mitigate the "memory wall bottleneck" typical in von-Neumann architectures. Over the years, different non-volatile memory (NVM) devices like Phase change memory (PCM), ferro-electric field effect transistor (FeFET), resistive random access memory (RRAM) and spin torque transfer-based magnetic RAM (STT-MRAM) have been used as memristors.
Figure 5: (a) Plot of energy-efficiency (measured in TOPS/W) vs Power for different low power edge AI accelerators. ANN workloads are deployed on CPU (Intel Movidius intel_movidius, Kalray kalray), GPU (Nvidia Jetson Orin Nano nvidia_orin_nano, Nvidia Xavier nvidia_xavier), systolic accelerators (Eyeriss-V1 chen2016eyeriss, Eyeriss-V2chen2019eyeriss), and IMC (Neurosim chen2018neurosim) accelerators. SNNs are deployed on SATA yin2022sata systolic array and SpikeSim moitra2023spikesim IMC accelerator platforms. SATA yin2022sata and SpikeSimmoitra2023spikesim are SNN-specific accelerators that closely resemble the Eyeriss chen2016eyeriss and Neurosimchen2018neurosim platforms, respectively and thus, facilitate a fair comparison. "$+$" denotes the conjunction of two approaches. Arrows are used to show the reduction and improvements in power and energy-efficiency, respectively. Higher energy-efficiency at lower power signifies a good AI accelerator platform. (b) Plot comparing the activation sparsity averaged across different layers of the ANN and SNN. Plots comparing (c) IMC chip area (d) TOPS and (e) TOPS/mm$^2$ of ANN and SNN implemented on the Neurosim chen2018neurosim and SpikeSim moitra2023spikesim platforms, respectively. For all implementations we use 8-bit VGG16 ANN and SNN (with 4 timesteps) trained on the CIFAR10 dataset. The SpikeSimmoitra2023spikesim and Neurosimchen2018neurosim-based hardware parameters are shown in Tables \ref{['tab:xbar_params']} & \ref{['tab:xbar_params2']} in the Appendix, respectively.
...and 8 more figures

When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Abstract

When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Authors

Abstract

Table of Contents

Figures (13)