A Novel 8T SRAM-Based In-Memory Computing Architecture for MAC-Derived Logical Functions
Amogh K M, Sunita M S
TL;DR
The paper tackles the energy cost of data movement in Von-Neumann systems by proposing an SRAM-based in-memory computing architecture built on an 8×8 array of 8T cells. It decouples read and write paths to improve reliability over 6T designs and uses charge-sharing on dedicated read bit-lines to perform multi-bit MAC operations, with a MAC decoder converting the analog MAC output into an 8-bit count. This MAC count is then used to infer fundamental logic functions (AND/NAND, NOR/OR, XOR/XNOR) and 1-bit addition without extra logic circuits. Cadence Virtuoso simulations in 90 nm at 1.8 V show 8-bit MAC and derived logic operating at 142.85 MHz with 0.7 ns latency and throughput of 15.8 Mops/s, achieving energy efficiency around 56.56 fJ/bit and highlighting the approach's potential for energy-efficient edge AI. The work demonstrates a practical route to integrating storage and computation in mature SRAM technology, enabling densely parallel MAC and logic within memory arrays.
Abstract
This paper presents an in-memory computing (IMC) architecture developed on an 8x8 array of 8T SRAM cells. This architecture enables both multi-bit parallel Multiply-Accumulate (MAC) operations and standard memory processing through charge-sharing on dedicated read bit-lines. By leveraging the maturity of SRAM technology, this work introduces an 8T SRAM-based IMC architecture that decouples read and write paths, thereby overcoming the reliability limitations of prior 6T SRAM designs. A novel analog-to-digital decoding scheme converts the MAC voltage output into digital counts, which are subsequently interpreted to realize fundamental logic functions including AND/NAND, NOR/OR, XOR/XNOR, and 1-bit addition within the same array. Simulated in a 90 nm CMOS process at 1.8 V supply voltage, the proposed design achieves 8-bit MAC and logical operations at a frequency of 142.85 MHz, with a latency of 0.7 ns and energy consumption of 56.56 fJ/bit per MAC operation and throughput of 15.8 M operations/s.
