Table of Contents
Fetching ...

PICO-RAM: A PVT-Insensitive Analog Compute-In-Memory SRAM Macro with In-Situ Multi-Bit Charge Computing and 6T Thin-Cell-Compatible Layout

Zhiyu Chen, Ziyuan Wen, Weier Wan, Akhil Reddy Pakala, Yiwei Zou, Wei-Chen Wei, Zengyi Li, Yubei Chen, Kaiyuan Yang

TL;DR

PICO-RAM presents a PVT-insensitive, compact SRAM compute-in-memory macro that performs in-situ multi-bit bit-parallel MVM using charge-domain MAC units integrated in a 6T-thin-cell layout. The design reuses local MOM capacitors for DAC, MAC, shift-and-add, and ADC, embedding the DAC and ADC within the array to minimize area and improve accuracy. A two-phase in-situ C-DAC and a shift-and-add mechanism enable high linearity and robust operation across wide voltage and temperature ranges, complemented by a dual-threshold time-domain ADC that substantially reduces energy. Measured on a 65-nm prototype, PICO-RAM achieves the highest reported SRAM CIM density, strong PVT robustness, and competitive inference performance on CIFAR-10/100 and speech tasks, with energy efficiency up to 40.2 TOPS/W. This work demonstrates a practical pathway for dense, low-power on-chip CIM accelerators for deep learning inference.

Abstract

Analog compute-in-memory (CIM) in static random-access memory (SRAM) is promising for accelerating deep learning inference by circumventing the memory wall and exploiting ultra-efficient analog low-precision arithmetic. Latest analog CIM designs attempt bit-parallel schemes for multi-bit analog Matrix-Vector Multiplication (MVM), aiming at higher energy efficiency, throughput, and training simplicity and robustness over conventional bit-serial methods that digitally shift-and-add multiple partial analog computing results. However, bit-parallel operations require more complex analog computations and become more sensitive to well-known analog CIM challenges, including large cell areas, inefficient and inaccurate multi-bit analog operations, and vulnerability to PVT variations. This paper presents PICO-RAM, a PVT-insensitive and compact CIM SRAM macro with charge-domain bit-parallel computation. It adopts a multi-bit thin-cell Multiply-Accumulate (MAC) unit that shares the same transistor layout as the most compact 6T SRAM cell. All analog computing modules, including digital-to-analog converters (DACs), MAC units, analog shift-and-add, and analog-to-digital converters (ADCs) reuse one set of local capacitors inside the array, performing in-situ computation to save area and enhance accuracy. A compact 8.5-bit dual-threshold time-domain ADC power gates the main path most of the time, leading to a significant energy reduction. Our 65-nm prototype achieves the highest weight storage density of 559 Kb/mm${^2}$ and exceptional robustness to temperature and voltage variations (-40 to 105 $^{\circ}$C and 0.65 to 1.2 V) among SRAM-based analog CIM designs.

PICO-RAM: A PVT-Insensitive Analog Compute-In-Memory SRAM Macro with In-Situ Multi-Bit Charge Computing and 6T Thin-Cell-Compatible Layout

TL;DR

PICO-RAM presents a PVT-insensitive, compact SRAM compute-in-memory macro that performs in-situ multi-bit bit-parallel MVM using charge-domain MAC units integrated in a 6T-thin-cell layout. The design reuses local MOM capacitors for DAC, MAC, shift-and-add, and ADC, embedding the DAC and ADC within the array to minimize area and improve accuracy. A two-phase in-situ C-DAC and a shift-and-add mechanism enable high linearity and robust operation across wide voltage and temperature ranges, complemented by a dual-threshold time-domain ADC that substantially reduces energy. Measured on a 65-nm prototype, PICO-RAM achieves the highest reported SRAM CIM density, strong PVT robustness, and competitive inference performance on CIFAR-10/100 and speech tasks, with energy efficiency up to 40.2 TOPS/W. This work demonstrates a practical pathway for dense, low-power on-chip CIM accelerators for deep learning inference.

Abstract

Analog compute-in-memory (CIM) in static random-access memory (SRAM) is promising for accelerating deep learning inference by circumventing the memory wall and exploiting ultra-efficient analog low-precision arithmetic. Latest analog CIM designs attempt bit-parallel schemes for multi-bit analog Matrix-Vector Multiplication (MVM), aiming at higher energy efficiency, throughput, and training simplicity and robustness over conventional bit-serial methods that digitally shift-and-add multiple partial analog computing results. However, bit-parallel operations require more complex analog computations and become more sensitive to well-known analog CIM challenges, including large cell areas, inefficient and inaccurate multi-bit analog operations, and vulnerability to PVT variations. This paper presents PICO-RAM, a PVT-insensitive and compact CIM SRAM macro with charge-domain bit-parallel computation. It adopts a multi-bit thin-cell Multiply-Accumulate (MAC) unit that shares the same transistor layout as the most compact 6T SRAM cell. All analog computing modules, including digital-to-analog converters (DACs), MAC units, analog shift-and-add, and analog-to-digital converters (ADCs) reuse one set of local capacitors inside the array, performing in-situ computation to save area and enhance accuracy. A compact 8.5-bit dual-threshold time-domain ADC power gates the main path most of the time, leading to a significant energy reduction. Our 65-nm prototype achieves the highest weight storage density of 559 Kb/mm and exceptional robustness to temperature and voltage variations (-40 to 105 C and 0.65 to 1.2 V) among SRAM-based analog CIM designs.
Paper Structure (18 sections, 7 equations, 20 figures, 1 table)

This paper contains 18 sections, 7 equations, 20 figures, 1 table.

Figures (20)

  • Figure 1: (a) BP, WBS, and BS schemes and (b) their simulated energy efficiency and CIFAR-10 accuracy (with ResNet-20) across CIM macro configurations.
  • Figure 2: Simulated SQNR and energy efficiency under different hardware configurations when (a) quantization level = 64 and (b) $N$ = 144.
  • Figure 3: Prior charge-domain shift-and-add designs for analog BP CIM, using (a) peripheral weighted capacitors and (b) C-2C ladders.
  • Figure 4: Proposed 6T-thin-cell-compatible cluster and operating waveforms.
  • Figure 5: Thin-cell MAC unit layout and integration with 6T cells.
  • ...and 15 more figures