Table of Contents
Fetching ...

An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

Nicolas Chauvaux, Adrian Kneip, Christoph Posch, Kofi Makinwa, Charlotte Frenkel

TL;DR

FlexSpIM is proposed, a novel digital CIM macro that supports arbitrary operand resolution and shape within a unified CIM storage for weights and membrane potentials and can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy on the IBM DVS gesture dataset.

Abstract

Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are promising solutions to enable $μ$s-level inference latency and ultra-low energy in edge vision applications. Yet, their current lack of flexibility at both the circuit and system levels prevents their deployment in a wide range of real-life scenarios. In this work, we propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials. These circuit-level techniques enable a hybrid weight- and output-stationary dataflow at the system level to maximize operand reuse, thereby minimizing costly on- and off-chip data movements during the SNN execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS demonstrate a 2$\times$ increase in bit-normalized energy efficiency compared to prior fixed-precision digital CIM-SNNs, while providing resolution reconfiguration with bitwise granularity. Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.

An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

TL;DR

FlexSpIM is proposed, a novel digital CIM macro that supports arbitrary operand resolution and shape within a unified CIM storage for weights and membrane potentials and can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy on the IBM DVS gesture dataset.

Abstract

Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are promising solutions to enable s-level inference latency and ultra-low energy in edge vision applications. Yet, their current lack of flexibility at both the circuit and system levels prevents their deployment in a wide range of real-life scenarios. In this work, we propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials. These circuit-level techniques enable a hybrid weight- and output-stationary dataflow at the system level to maximize operand reuse, thereby minimizing costly on- and off-chip data movements during the SNN execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS demonstrate a 2 increase in bit-normalized energy efficiency compared to prior fixed-precision digital CIM-SNNs, while providing resolution reconfiguration with bitwise granularity. Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.

Paper Structure

This paper contains 8 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: (a) Event-based edge vision system and workload example. (b) Integrate-and-fire spiking neuron model. (c) Adopted execution flow targeting low latency execution. (d-f) Three core challenges in the state of the art and corresponding innovations in the proposed design.
  • Figure 2: (a) Digital CIM memory overview: array of bitcells and peripheral circuit (PC) attached to each bitline (BL) for operation handling. Following digital CIM operations, two wordlines (WLs) are simultaneously activated. (b) Example of a digital CIM operation between two bitcells storing A=1 and B=0. Two boolean operations are obtained and can be used to obtain a 1-bit full adder by following the equations provided. (c) Example of waveforms and phases to perform the digital CIM operation illustrated in (b). (d) Architecture of the proposed FlexSpIM digital CIM-SRAM macro. (e) Decomposition of a PC into modules with their detailed schematics.
  • Figure 3: (a) Arbitrary resolution and operand shaping principle. (b) Example of operand shaping for 5-bit weight and 10-bit membrane potential with a selected parallelization of one neuron. (c) Example of operand shaping for 6-bit weight and 9-bit membrane potential with a selected parallelization of three neurons. (d) Carry-selection logic and PC state configuration modes. (e) PC configurations for bit-serial and 4$\times$3 operand shaping.
  • Figure 4: (a) Layer-level memory requirements for weights and membrane potentials of a spiking CNN composed of six convolutional layers (i.e., L1 to L6) and three FC layers (not shown), with two different HS dataflow situations highlighted by the brown and pink lines. (b) Mapping of the model on two CIM macros for WS-only and HS-min dataflows.
  • Figure 5: (a) Overall FlexSpIM system architecture and (b) chip microphotograph in bulk 40-nm CMOS.
  • ...and 2 more figures