Table of Contents
Fetching ...

SiTe CiM: Signed Ternary Computing-in-Memory for Ultra-Low Precision Deep Neural Networks

Niharika Thakuria, Akul Malhotra, Sandeep K. Thirumala, Reena Elangovan, Anand Raghunathan, Sumeet K. Gupta

TL;DR

This paper addresses energy-efficient inference for ultra-low precision DNNs by introducing Signed Ternary CiM (SiTe CiM), which performs dot products directly in memory using cross-coupled bitcells to support signed ternary inputs and weights. It presents two flavors, SiTe CiM I for high performance and SiTe CiM II for improved area efficiency, and demonstrates the approach on 8T-SRAM, 3T-eDRAM, and 3T-FEMFET memories with substantial array-level gains (up to 88% lower latency and 78% lower energy) and manageable area overheads (18–34% for I and 6% for II). System-level evaluation within a ternary DNN accelerator shows up to 7× throughput improvement and up to 2.5× energy efficiency gains over near-memory baselines, highlighting significant practical impact for edge AI. The work also discusses generalization to other memory technologies and outlines design trade-offs between area, latency, and sensing schemes for signed ternary CiM.

Abstract

Ternary Deep Neural Networks (DNN) have shown a large potential for highly energy-constrained systems by virtue of their low power operation (due to ultra-low precision) with only a mild degradation in accuracy. To enable an energy-efficient hardware substrate for such systems, we propose a compute-enabled memory design, referred to as SiTe-CiM, which features computing-in-memory (CiM) of dot products between signed ternary (SiTe) inputs and weights. SiTe CiM is based on cross-coupling of two bit cells to enable CiM of dot products in the signed ternary regime. We explore SiTe CiM with 8T-SRAM, 3T-embedded DRAM (3T-eDRAM) and 3T-ferroelectric metal FET (FEMFET) memories. We propose two flavors of this technique, namely SiTe CiM I/II. In SiTe CiM I, we employ two additional transistors per cell for cross-coupling, achieving fast CiM operations, albeit incurring an area overhead ranging from 18% to 34% (compared to standard ternary memories). In SiTe CiM II, four extra transistors are utilized for every 16 cells in a column, thereby incurring only 6% area cost (but leading to slower CiM than SiTe CiM I). Based on the array analysis, our designs achieve up to 88% lower CiM latency and 78% CiM energy savings across various technologies considered, as compared to their respective near-memory computing counterparts. Further, we perform system level analysis by incorporating SiTe CiM I/II arrays in a ternary DNN accelerator and show up to 7X throughput boost and up to 2.5X energy reduction compared to the near-memory ternary DNN accelerators.

SiTe CiM: Signed Ternary Computing-in-Memory for Ultra-Low Precision Deep Neural Networks

TL;DR

This paper addresses energy-efficient inference for ultra-low precision DNNs by introducing Signed Ternary CiM (SiTe CiM), which performs dot products directly in memory using cross-coupled bitcells to support signed ternary inputs and weights. It presents two flavors, SiTe CiM I for high performance and SiTe CiM II for improved area efficiency, and demonstrates the approach on 8T-SRAM, 3T-eDRAM, and 3T-FEMFET memories with substantial array-level gains (up to 88% lower latency and 78% lower energy) and manageable area overheads (18–34% for I and 6% for II). System-level evaluation within a ternary DNN accelerator shows up to 7× throughput improvement and up to 2.5× energy efficiency gains over near-memory baselines, highlighting significant practical impact for edge AI. The work also discusses generalization to other memory technologies and outlines design trade-offs between area, latency, and sensing schemes for signed ternary CiM.

Abstract

Ternary Deep Neural Networks (DNN) have shown a large potential for highly energy-constrained systems by virtue of their low power operation (due to ultra-low precision) with only a mild degradation in accuracy. To enable an energy-efficient hardware substrate for such systems, we propose a compute-enabled memory design, referred to as SiTe-CiM, which features computing-in-memory (CiM) of dot products between signed ternary (SiTe) inputs and weights. SiTe CiM is based on cross-coupling of two bit cells to enable CiM of dot products in the signed ternary regime. We explore SiTe CiM with 8T-SRAM, 3T-embedded DRAM (3T-eDRAM) and 3T-ferroelectric metal FET (FEMFET) memories. We propose two flavors of this technique, namely SiTe CiM I/II. In SiTe CiM I, we employ two additional transistors per cell for cross-coupling, achieving fast CiM operations, albeit incurring an area overhead ranging from 18% to 34% (compared to standard ternary memories). In SiTe CiM II, four extra transistors are utilized for every 16 cells in a column, thereby incurring only 6% area cost (but leading to slower CiM than SiTe CiM I). Based on the array analysis, our designs achieve up to 88% lower CiM latency and 78% CiM energy savings across various technologies considered, as compared to their respective near-memory computing counterparts. Further, we perform system level analysis by incorporating SiTe CiM I/II arrays in a ternary DNN accelerator and show up to 7X throughput boost and up to 2.5X energy reduction compared to the near-memory ternary DNN accelerators.
Paper Structure (43 sections, 13 figures)

This paper contains 43 sections, 13 figures.

Figures (13)

  • Figure 1: (a) 8T-SRAM, (b) 3T-eDRAM, (c) 3T-FEMFET, (d) Generalized schematic of a bit cell with separated write/read path, (e) bit storage information and (f) bit sense/read information in 8T-SRAM, 3T-eDRAM, 3T-FEMFET.
  • Figure 2: (a) Schematic of SiTe CiM I cell. Inset shows that the underlying cell/storage element can be 8T-SRAM, 3T-eDRAM or 3T-FEMFET cell, (b) symbol of a SiTe CiM I cell.
  • Figure 3: (a) Weight, (b) input, (c) output encoding, (d) truth table for ternary scalar product computation, (e-f) examples of scalar multiplication in SiTe CiM.
  • Figure 4: Column of SiTe CiM I cells showing ternary dot-product computation using input ($I$) and weight ($W$) vector, (b) an example of worst-case input-weight combination for maximum $RBL$ discharge, (c) $RBL$ voltage vs number of no. of discharges.
  • Figure 5: (a) Schematic of a column of SiTe CiM II cells, (b) weight, (c) input, (d) output encoding, (e) truth table for ternary scalar product computation, (f-g) examples of scalar multiplication in SiTe CiM II, (h) dot product computation in a column of SiTe CiM II cells.
  • ...and 8 more figures