Table of Contents
Fetching ...

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang

TL;DR

An optimized design configuration using inhomogeneous sampling of stochastic PS achieves up to 130x improvement in Energy-Delay-Product compared to IMC with full precision ADC, while maintaining near-software accuracy at various benchmark classification tasks.

Abstract

Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 16x, 8x, and 10x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 130x (24x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

TL;DR

An optimized design configuration using inhomogeneous sampling of stochastic PS achieves up to 130x improvement in Energy-Delay-Product compared to IMC with full precision ADC, while maintaining near-software accuracy at various benchmark classification tasks.

Abstract

Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 16x, 8x, and 10x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 130x (24x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.
Paper Structure (16 sections, 5 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 5 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: ADC bottleneck incurred at the partial sum processing in IMC crossbar architecture.
  • Figure 2: The overview of the proposed crossbar MVM processing with stochastic MTJ converter.
  • Figure 3: Computational flow of MVM operation in the proposed StoX-Net architecture. In this example, inputs and weights are 4-bit fixed-point values ($A_b = W_b$ = 4). Slices and streams are 1-bit ($A_s = W_s = 1$).
  • Figure 4: The distribution of normalized array-level MVM outputs collected in a DNN model trained with stochastic MTJs ("StoX") compared with a model with sense amplifier (SA) behavior.
  • Figure 5: Monte Carlo simulation for determining the layer-wise importance and error sensitivity.
  • ...and 4 more figures