Table of Contents
Fetching ...

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

Haobo Liu, Zhengyang Qian, Wei Wu, Hongwei Ren, Zhiwei Liu, Leibin Ni

TL;DR

This work tackles the high power cost of FP8 processing in edge AI by introducing AFPR-CIM, an analog-domain compute-in-memory architecture based on RRAM that performs INT-domain MAC within the memory and interfaces to FP8 data via FP-DAC/FP-ADC. A dynamic-range adaptive FP-ADC and a corresponding FP-DAC enable FP8 (E2M5) activation and mantissa-exponent encoding, allowing FP8 neural computations with high energy efficiency. The architecture achieves 19.89 TOPS/W and 1474.56 GFLOPS, with superior efficiency compared to FP8 digital accelerators, digital FP-CIM, and analog INT8 CIM, while preserving accuracy advantages of FP8 over INT8 in representative networks. The results demonstrate the viability of FP8-CIM with adaptive range readout, offering practical impact for low-power, high-throughput AI inference at the edge.

Abstract

Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively.

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

TL;DR

This work tackles the high power cost of FP8 processing in edge AI by introducing AFPR-CIM, an analog-domain compute-in-memory architecture based on RRAM that performs INT-domain MAC within the memory and interfaces to FP8 data via FP-DAC/FP-ADC. A dynamic-range adaptive FP-ADC and a corresponding FP-DAC enable FP8 (E2M5) activation and mantissa-exponent encoding, allowing FP8 neural computations with high energy efficiency. The architecture achieves 19.89 TOPS/W and 1474.56 GFLOPS, with superior efficiency compared to FP8 digital accelerators, digital FP-CIM, and analog INT8 CIM, while preserving accuracy advantages of FP8 over INT8 in representative networks. The results demonstrate the viability of FP8-CIM with adaptive range readout, offering practical impact for low-power, high-throughput AI inference at the edge.

Abstract

Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively.
Paper Structure (13 sections, 5 equations, 6 figures, 1 table)

This paper contains 13 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: (a) Data flow. (b) The proposed AFPR-CIM architecture.
  • Figure 2: (a) Architecture of dynamic range adaptive FP-ADC. (b) Dynamic range adaptive process. (c) FP conversion process.
  • Figure 3: Architecture of Input FP-DAC.
  • Figure 4: Mapping method for the fully connected layer and the convolution layer on the proposed CIM Macros.
  • Figure 5: (a) Transient simulation results of FP-ADC. (b) Linearity analysis of FP-DAC.
  • ...and 1 more figures