SiTe CiM: Signed Ternary Computing-in-Memory for Ultra-Low Precision Deep Neural Networks
Niharika Thakuria, Akul Malhotra, Sandeep K. Thirumala, Reena Elangovan, Anand Raghunathan, Sumeet K. Gupta
TL;DR
This paper addresses energy-efficient inference for ultra-low precision DNNs by introducing Signed Ternary CiM (SiTe CiM), which performs dot products directly in memory using cross-coupled bitcells to support signed ternary inputs and weights. It presents two flavors, SiTe CiM I for high performance and SiTe CiM II for improved area efficiency, and demonstrates the approach on 8T-SRAM, 3T-eDRAM, and 3T-FEMFET memories with substantial array-level gains (up to 88% lower latency and 78% lower energy) and manageable area overheads (18–34% for I and 6% for II). System-level evaluation within a ternary DNN accelerator shows up to 7× throughput improvement and up to 2.5× energy efficiency gains over near-memory baselines, highlighting significant practical impact for edge AI. The work also discusses generalization to other memory technologies and outlines design trade-offs between area, latency, and sensing schemes for signed ternary CiM.
Abstract
Ternary Deep Neural Networks (DNN) have shown a large potential for highly energy-constrained systems by virtue of their low power operation (due to ultra-low precision) with only a mild degradation in accuracy. To enable an energy-efficient hardware substrate for such systems, we propose a compute-enabled memory design, referred to as SiTe-CiM, which features computing-in-memory (CiM) of dot products between signed ternary (SiTe) inputs and weights. SiTe CiM is based on cross-coupling of two bit cells to enable CiM of dot products in the signed ternary regime. We explore SiTe CiM with 8T-SRAM, 3T-embedded DRAM (3T-eDRAM) and 3T-ferroelectric metal FET (FEMFET) memories. We propose two flavors of this technique, namely SiTe CiM I/II. In SiTe CiM I, we employ two additional transistors per cell for cross-coupling, achieving fast CiM operations, albeit incurring an area overhead ranging from 18% to 34% (compared to standard ternary memories). In SiTe CiM II, four extra transistors are utilized for every 16 cells in a column, thereby incurring only 6% area cost (but leading to slower CiM than SiTe CiM I). Based on the array analysis, our designs achieve up to 88% lower CiM latency and 78% CiM energy savings across various technologies considered, as compared to their respective near-memory computing counterparts. Further, we perform system level analysis by incorporating SiTe CiM I/II arrays in a ternary DNN accelerator and show up to 7X throughput boost and up to 2.5X energy reduction compared to the near-memory ternary DNN accelerators.
