A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural Networks
Zhiqiang Yi, Yiwen Liang, Weidong Cao
TL;DR
This work addresses the energy challenge of high-precision deep neural network acceleration by introducing a hybrid-domain, SRAM-based compute-in-memory macro for floating-point operations. By decomposing mantissa multiplication into a compute-light sub-ADD and a compute-heavy sub-MUL, the authors implement digital CIM for accurate addition and analog CIM for efficient multiplication, anchored by a novel Mantissa MAC with per-cell local computing. Circuit-level simulations demonstrate an energy efficiency gain of about $1.53\times$ over fully digital baselines while preserving lossless accuracy on FP8 models, supported by a 3-bit Flash ADC and switched-capacitor accumulation. The approach offers a practical path to energy-efficient, high-precision DNN inference and potential training acceleration at the edge, with scalable area implications.
Abstract
Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications. Recently, there has been growing interest in developing floating-point-based CIM macros to improve the accuracy of high-precision DNN models, including both inference and training tasks. Yet, current implementations rely primarily on digital methods, leading to substantial power consumption. This paper introduces a hybrid domain CIM architecture that integrates analog and digital CIM within the same memory cell to efficiently accelerate high-precision DNNs. Specifically, we develop area-efficient circuits and energy-efficient analog-to-digital conversion techniques to realize this architecture. Comprehensive circuit-level simulations reveal the notable energy efficiency and lossless accuracy of the proposed design on benchmarks.
