Table of Contents
Fetching ...

Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators

Chenguang Zhang, Zhihang Yuan, Xingchen Li, Guangyu Sun

TL;DR

The paper addresses the high energy cost of analog-to-digital converters in ReRAM-based processing-in-memory accelerators for deep neural networks. It introduces an algorithm-hardware co-design around Twin Range quantization (TRQ), a configurable, non-analog modification to the SAR ADC that exploits skewed crossbar bit-line distributions to reduce A/D operations without sacrificing accuracy. The approach includes a hardware design, a decoding-friendly coding scheme, and a layer-wise parameter calibration strategy that preserves DNN flexibility and avoids retraining. Empirical results show substantial ADC power savings of $1.6 \sim 2.3\times$ and overall energy reductions of $42\% \sim 62\%$ across several networks and datasets, demonstrating practical impact for energy-efficient ReRAM accelerators.

Abstract

Deep neural networks are widely deployed in many fields. Due to the in-situ computation (known as processing in memory) capacity of the Resistive Random Access Memory (ReRAM) crossbar, ReRAM-based accelerator shows potential in accelerating DNN with low power and high performance. However, despite power advantage, such kind of accelerators suffer from the high power consumption of peripheral circuits, especially Analog-to-Digital Converter (ADC), which account for over 60 percent of total power consumption. This problem hinders the ReRAM-based accelerator to achieve higher efficiency. Some redundant Analog-to-Digital conversion operations have no contribution to maintaining inference accuracy, and such operations can be eliminated by modifying the ADC searching logic. Based on such observations, we propose an algorithm-hardware co-design method and explore the co-design approach in both hardware design and quantization algorithms. Firstly, we focus on the distribution output along the crossbar's bit-lines and identify the fine-grained redundant ADC sampling bits. % of weight and To further compress ADC bits, we propose a hardware-friendly quantization method and coding scheme, in which different quantization strategy was applied to the partial results in different intervals. To support the two features above, we propose a lightweight architectural design based on SAR-ADC\@. It's worth mentioning that our method is not only more energy efficient but also retains the flexibility of the algorithm. Experiments demonstrate that our method can reduce about $1.6 \sim 2.3 \times$ ADC power reduction.

Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators

TL;DR

The paper addresses the high energy cost of analog-to-digital converters in ReRAM-based processing-in-memory accelerators for deep neural networks. It introduces an algorithm-hardware co-design around Twin Range quantization (TRQ), a configurable, non-analog modification to the SAR ADC that exploits skewed crossbar bit-line distributions to reduce A/D operations without sacrificing accuracy. The approach includes a hardware design, a decoding-friendly coding scheme, and a layer-wise parameter calibration strategy that preserves DNN flexibility and avoids retraining. Empirical results show substantial ADC power savings of and overall energy reductions of across several networks and datasets, demonstrating practical impact for energy-efficient ReRAM accelerators.

Abstract

Deep neural networks are widely deployed in many fields. Due to the in-situ computation (known as processing in memory) capacity of the Resistive Random Access Memory (ReRAM) crossbar, ReRAM-based accelerator shows potential in accelerating DNN with low power and high performance. However, despite power advantage, such kind of accelerators suffer from the high power consumption of peripheral circuits, especially Analog-to-Digital Converter (ADC), which account for over 60 percent of total power consumption. This problem hinders the ReRAM-based accelerator to achieve higher efficiency. Some redundant Analog-to-Digital conversion operations have no contribution to maintaining inference accuracy, and such operations can be eliminated by modifying the ADC searching logic. Based on such observations, we propose an algorithm-hardware co-design method and explore the co-design approach in both hardware design and quantization algorithms. Firstly, we focus on the distribution output along the crossbar's bit-lines and identify the fine-grained redundant ADC sampling bits. % of weight and To further compress ADC bits, we propose a hardware-friendly quantization method and coding scheme, in which different quantization strategy was applied to the partial results in different intervals. To support the two features above, we propose a lightweight architectural design based on SAR-ADC\@. It's worth mentioning that our method is not only more energy efficient but also retains the flexibility of the algorithm. Experiments demonstrate that our method can reduce about ADC power reduction.
Paper Structure (25 sections, 10 equations, 7 figures, 1 algorithm)

This paper contains 25 sections, 10 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Mapping convolutional layers. $K_w, K_i$ are the bit-width of the weight and input activation respectively.
  • Figure 2: Conventional SAR ADC with (a) uniform and (b) non-uniform grid
  • Figure 3: (a) Distribution of the output of crossbar's BLs. (b) Twin ranges quantization.
  • Figure 4: (a) DAC output with two searching strategies. (b) The bit mapping of the ADC output code.
  • Figure 5: Overall architecture
  • ...and 2 more figures