Table of Contents
Fetching ...

DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format

Shady Agwa, Yikang Shen, Shiwei Wang, Themis Prodromakis

TL;DR

This work introduces DISCA, a Digital In-SRAM Stochastic Computing architecture tailored for matrix-matrix multiplication in AI workloads. By employing a compressed BP8 Bent-Pyramid data format and bitline computing, DISCA achieves in-memory stochastic multiplication with a binary accumulation back-end, avoiding the heavy decoders and microcontrollers typical of digital IMC. Hardware results at 180 nm demonstrate 3.59 TOPS/W per bit at 500 MHz for a 128 KB engine, with substantial potential for improvement when scaling to advanced nodes (two orders of magnitude). DISCA leverages conventional CMOS technology, reducing reliance on emerging devices while delivering high throughput and energy efficiency, making it a promising path to address memory-bandwidth and energy challenges in edge AI and beyond.

Abstract

Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.

DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format

TL;DR

This work introduces DISCA, a Digital In-SRAM Stochastic Computing architecture tailored for matrix-matrix multiplication in AI workloads. By employing a compressed BP8 Bent-Pyramid data format and bitline computing, DISCA achieves in-memory stochastic multiplication with a binary accumulation back-end, avoiding the heavy decoders and microcontrollers typical of digital IMC. Hardware results at 180 nm demonstrate 3.59 TOPS/W per bit at 500 MHz for a 128 KB engine, with substantial potential for improvement when scaling to advanced nodes (two orders of magnitude). DISCA leverages conventional CMOS technology, reducing reliance on emerging devices while delivering high throughput and energy efficiency, making it a promising path to address memory-bandwidth and energy challenges in edge AI and beyond.

Abstract

Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.

Paper Structure

This paper contains 9 sections, 5 figures.

Figures (5)

  • Figure 1: A compressed 8-bit Version of Bent-Pyramid datasets, achieving the same 10-bit Bent-Pyramid functionality.
  • Figure 2: In-memory computing for matrix-matrix multiplication: (a) an example of matrix-matrix multiplication workload including the micro-algorithm, where input matrix $L$ is multiplied by weight matrix $U$ to generate the output matrix $O$; (b) digital in-memory computing architecture for binary-based computing; (c) the proposed DISCA architecture for matrix-matrix multiplication using in-situ stochastic multiplication and near-memory binary accumulation.
  • Figure 3: An 8CX128R SRAM core slice, including the layout of 8x8 6T bitcells macro, pre-charge circuitry, write drivers, and sense-amplifiers.
  • Figure 4: Simulations of eight read operations for different 8-bit values stored in the SRAM core.
  • Figure 5: Simulation results of SC multiplication using the bitline computing approach where the upper matrix $U$ and lower matrix $L$ are logically represented by even wordlines and odd wordlines respectively.