DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format
Shady Agwa, Yikang Shen, Shiwei Wang, Themis Prodromakis
TL;DR
This work introduces DISCA, a Digital In-SRAM Stochastic Computing architecture tailored for matrix-matrix multiplication in AI workloads. By employing a compressed BP8 Bent-Pyramid data format and bitline computing, DISCA achieves in-memory stochastic multiplication with a binary accumulation back-end, avoiding the heavy decoders and microcontrollers typical of digital IMC. Hardware results at 180 nm demonstrate 3.59 TOPS/W per bit at 500 MHz for a 128 KB engine, with substantial potential for improvement when scaling to advanced nodes (two orders of magnitude). DISCA leverages conventional CMOS technology, reducing reliance on emerging devices while delivering high throughput and energy efficiency, making it a promising path to address memory-bandwidth and energy challenges in edge AI and beyond.
Abstract
Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.
