Table of Contents
Fetching ...

A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural Networks

Zhiqiang Yi, Yiwen Liang, Weidong Cao

TL;DR

This work addresses the energy challenge of high-precision deep neural network acceleration by introducing a hybrid-domain, SRAM-based compute-in-memory macro for floating-point operations. By decomposing mantissa multiplication into a compute-light sub-ADD and a compute-heavy sub-MUL, the authors implement digital CIM for accurate addition and analog CIM for efficient multiplication, anchored by a novel Mantissa MAC with per-cell local computing. Circuit-level simulations demonstrate an energy efficiency gain of about $1.53\times$ over fully digital baselines while preserving lossless accuracy on FP8 models, supported by a 3-bit Flash ADC and switched-capacitor accumulation. The approach offers a practical path to energy-efficient, high-precision DNN inference and potential training acceleration at the edge, with scalable area implications.

Abstract

Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications. Recently, there has been growing interest in developing floating-point-based CIM macros to improve the accuracy of high-precision DNN models, including both inference and training tasks. Yet, current implementations rely primarily on digital methods, leading to substantial power consumption. This paper introduces a hybrid domain CIM architecture that integrates analog and digital CIM within the same memory cell to efficiently accelerate high-precision DNNs. Specifically, we develop area-efficient circuits and energy-efficient analog-to-digital conversion techniques to realize this architecture. Comprehensive circuit-level simulations reveal the notable energy efficiency and lossless accuracy of the proposed design on benchmarks.

A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural Networks

TL;DR

This work addresses the energy challenge of high-precision deep neural network acceleration by introducing a hybrid-domain, SRAM-based compute-in-memory macro for floating-point operations. By decomposing mantissa multiplication into a compute-light sub-ADD and a compute-heavy sub-MUL, the authors implement digital CIM for accurate addition and analog CIM for efficient multiplication, anchored by a novel Mantissa MAC with per-cell local computing. Circuit-level simulations demonstrate an energy efficiency gain of about over fully digital baselines while preserving lossless accuracy on FP8 models, supported by a 3-bit Flash ADC and switched-capacitor accumulation. The approach offers a practical path to energy-efficient, high-precision DNN inference and potential training acceleration at the edge, with scalable area implications.

Abstract

Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications. Recently, there has been growing interest in developing floating-point-based CIM macros to improve the accuracy of high-precision DNN models, including both inference and training tasks. Yet, current implementations rely primarily on digital methods, leading to substantial power consumption. This paper introduces a hybrid domain CIM architecture that integrates analog and digital CIM within the same memory cell to efficiently accelerate high-precision DNNs. Specifically, we develop area-efficient circuits and energy-efficient analog-to-digital conversion techniques to realize this architecture. Comprehensive circuit-level simulations reveal the notable energy efficiency and lossless accuracy of the proposed design on benchmarks.

Paper Structure

This paper contains 16 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Illustration of conventional FP SRAM CIM architecture. $X_{E,i}$ and $W_{E,i}$ are the exponent parts of activation and weight. $X_{M,i}$ and $W_{M,i}$ are the mantissa parts of activation and weight. ①, ②, and ③ are circuit-level representations of the steps in Eq. \ref{['eq: fp_add']}, (b) FP8 format (E4M3 and E5M2).
  • Figure 2: (a) Architecture overview of the proposed hybrid-domain FP CIM macro, (b) Hybrid-domain FP SRAM CIM mantissa unit, (c) Switched-capacitor array with the Flash ADC, (d) 6T SRAM, (e) pseudo XOR gate, (f) pseudo AND gate, LAC for sub-add (blue line) and MUL for ACIM (orange line).
  • Figure 3: The computation error across multiple combinations of 4-b inputs and 4-b weight
  • Figure 4: Mantissa MAC unit layouts: (a) Single mantissa computing unit, (b) Layout of 4 cells.