Table of Contents
Fetching ...

Investigating Energy Bounds of Analog Compute-in-Memory with Local Normalization

Brian Rojkov, Shubham Ranjan, Derek Wright, Manoj Sachdev

TL;DR

This work tackles the energy efficiency challenge of analog Compute-in-Memory (CIM) for edge AI, focusing on floating-point workloads where wide dynamic range is necessary. It introduces the Gain-Ranging MAC (GR-MAC), a mixed-signal architecture that performs local normalization of mantissas and uses exponent-weighted gain ranging to decouple input dynamic range from precision, keeping the MAC in a low-precision analog regime. The authors provide architectural designs (unit/row/INT normalization variants) and a comprehensive energy-modeling analysis, showing that ADC energy can be significantly reduced and that the input dynamic range can be expanded without increasing energy at 35 dB SQNR; an upper bound improvement of about 1.5 bits on ADC resolution is demonstrated across realistic distributions. Collectively, GR-MAC offers a pathway to substantially improve energy scaling in FP-CIM, enabling more efficient processing for modern AI workloads such as Large Language Models.

Abstract

Modern edge AI workloads demand maximum energy efficiency, motivating the pursuit of analog Compute-in-Memory (CIM) architectures. Simultaneously, the popularity of Large-Language-Models (LLMs) drives the adoption of low-bit floating-point formats which prioritize dynamic range. However, the conventional direct-accumulation CIM accommodates floating-points by normalizing them to a shared widened fixed-point scale. Consequently, hardware resolution is dictated by the input's dynamic range rather than its precision, and energy consumption is dominated by the ADC. We address this limitation by introducing local normalization for each input, weight, and multiply-accumulate (MAC) output via a Gain-Ranging MAC (GR-MAC). Normalization overhead is handled by low-power digital logic, enabling the computationally expensive MAC operation to remain in the energy-efficient low-precision analog regime. Energy modelling shows that the addition of a gain-ranging Stage to the MAC enables a 4-bit increase in input dynamic range without increased energy consumption at a 35 dB SQNR standard. Additionally, the ADC resolution requirement becomes invariant to input distribution assumptions, allowing construction of an upper bound with a 1.5-bit reduction compared to the conventional lower bound. These results establish a pathway towards unlocking favourable energy scaling trends of analog CIM for modern AI workloads.

Investigating Energy Bounds of Analog Compute-in-Memory with Local Normalization

TL;DR

This work tackles the energy efficiency challenge of analog Compute-in-Memory (CIM) for edge AI, focusing on floating-point workloads where wide dynamic range is necessary. It introduces the Gain-Ranging MAC (GR-MAC), a mixed-signal architecture that performs local normalization of mantissas and uses exponent-weighted gain ranging to decouple input dynamic range from precision, keeping the MAC in a low-precision analog regime. The authors provide architectural designs (unit/row/INT normalization variants) and a comprehensive energy-modeling analysis, showing that ADC energy can be significantly reduced and that the input dynamic range can be expanded without increasing energy at 35 dB SQNR; an upper bound improvement of about 1.5 bits on ADC resolution is demonstrated across realistic distributions. Collectively, GR-MAC offers a pathway to substantially improve energy scaling in FP-CIM, enabling more efficient processing for modern AI workloads such as Large Language Models.

Abstract

Modern edge AI workloads demand maximum energy efficiency, motivating the pursuit of analog Compute-in-Memory (CIM) architectures. Simultaneously, the popularity of Large-Language-Models (LLMs) drives the adoption of low-bit floating-point formats which prioritize dynamic range. However, the conventional direct-accumulation CIM accommodates floating-points by normalizing them to a shared widened fixed-point scale. Consequently, hardware resolution is dictated by the input's dynamic range rather than its precision, and energy consumption is dominated by the ADC. We address this limitation by introducing local normalization for each input, weight, and multiply-accumulate (MAC) output via a Gain-Ranging MAC (GR-MAC). Normalization overhead is handled by low-power digital logic, enabling the computationally expensive MAC operation to remain in the energy-efficient low-precision analog regime. Energy modelling shows that the addition of a gain-ranging Stage to the MAC enables a 4-bit increase in input dynamic range without increased energy consumption at a 35 dB SQNR standard. Additionally, the ADC resolution requirement becomes invariant to input distribution assumptions, allowing construction of an upper bound with a 1.5-bit reduction compared to the conventional lower bound. These results establish a pathway towards unlocking favourable energy scaling trends of analog CIM for modern AI workloads.
Paper Structure (30 sections, 1 equation, 9 figures, 2 tables)

This paper contains 30 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Classification of CMOS-SRAM-based CIMs based on compute domain and quantization type. While emerging Non-Uniform/Floating-Point (pink) works offer superior information efficiency over Uniform/Integer (blue), prior implementations suffer from logic complexity, approximation, or overheads introduced by conversion and long-tailed data distributions. Our work (bottom) introduces a direct, native floating-point MAC to eliminate these bottlenecks.
  • Figure 2: Conventional CIM based on (a) digital and (b) charge-based analog techniques. (c) Global normalization procedure used to align floating-point inputs for processing in conventional CIM arrays.
  • Figure 3: The proposed Gain-Ranging MAC and CIM architecture template for design space exploration. Global normalization (dashed lines) is optionally included if more input dynamic range is required than the CIM array's native capability.
  • Figure 4: (a) Conventional and (b) proposed MAC unit with periphery, enabling (c) a reduction in data converter precision, and (d) an illustration of the magnitude-based binning of normalization. The effect of redundant precision in the DAC and ADC required by conventional FP-to-INT architectures are shown as bars filled to represent the information content of the quantized inputs as a fraction of the capacity dictated by DAC and ADC requirements. A normal distribution clipped to $4\sigma$ is chosen for the input $x$ and weight $W$ for illustration purposes; the proposed GR-MAC's benefit is not strictly reliant on this input distribution. Similarly, $N_{\mathrm{R}}=32$ and the FP6 data format are chosen as an example for illustration. The ADC resolutions are specified according to the analysis in Section \ref{['sec:anal']}.
  • Figure 5: Switched-capacitor divider with variable output capacitance used for GR-MAC. Switch and floating top- and bottom-plate parasitics are modeled to a first-order approximation as lumped capacitances $C_{\mathrm{p}1}$ and $C_{\mathrm{p}2}$ at the floating nodes.
  • ...and 4 more figures