Table of Contents
Fetching ...

A Blueprint for Precise and Fault-Tolerant Analog Neural Networks

Cansu Demirkiran, Lakshmi Nair, Darius Bunandar, Ajay Joshi

TL;DR

This study demonstrates that the RNS-based approach can achieve ≥99% FP32 accuracy with 6-bit integer arithmetic for DNN inference and 7-bit for DNN training and presents a fault-tolerant dataflow using redundant RNS to protect the computation against noise and errors inherent within analog hardware.

Abstract

Analog computing has reemerged as a promising avenue for accelerating deep neural networks (DNNs) due to its potential to overcome the energy efficiency and scalability challenges posed by traditional digital architectures. However, achieving high precision and DNN accuracy using such technologies is challenging, as high-precision data converters are costly and impractical. In this paper, we address this challenge by using the residue number system (RNS). RNS allows composing high-precision operations from multiple low-precision operations, thereby eliminating the information loss caused by the limited precision of the data converters. Our study demonstrates that analog accelerators utilizing the RNS-based approach can achieve ${\geq}99\%$ of FP32 accuracy for state-of-the-art DNN inference using data converters with only $6$-bit precision whereas a conventional analog core requires more than $8$-bit precision to achieve the same accuracy in the same DNNs. The reduced precision requirements imply that using RNS can reduce the energy consumption of analog accelerators by several orders of magnitude while maintaining the same throughput and precision. Our study extends this approach to DNN training, where we can efficiently train DNNs using $7$-bit integer arithmetic while achieving accuracy comparable to FP32 precision. Lastly, we present a fault-tolerant dataflow using redundant RNS error-correcting codes to protect the computation against noise and errors inherent within an analog accelerator.

A Blueprint for Precise and Fault-Tolerant Analog Neural Networks

TL;DR

This study demonstrates that the RNS-based approach can achieve ≥99% FP32 accuracy with 6-bit integer arithmetic for DNN inference and 7-bit for DNN training and presents a fault-tolerant dataflow using redundant RNS to protect the computation against noise and errors inherent within analog hardware.

Abstract

Analog computing has reemerged as a promising avenue for accelerating deep neural networks (DNNs) due to its potential to overcome the energy efficiency and scalability challenges posed by traditional digital architectures. However, achieving high precision and DNN accuracy using such technologies is challenging, as high-precision data converters are costly and impractical. In this paper, we address this challenge by using the residue number system (RNS). RNS allows composing high-precision operations from multiple low-precision operations, thereby eliminating the information loss caused by the limited precision of the data converters. Our study demonstrates that analog accelerators utilizing the RNS-based approach can achieve of FP32 accuracy for state-of-the-art DNN inference using data converters with only -bit precision whereas a conventional analog core requires more than -bit precision to achieve the same accuracy in the same DNNs. The reduced precision requirements imply that using RNS can reduce the energy consumption of analog accelerators by several orders of magnitude while maintaining the same throughput and precision. Our study extends this approach to DNN training, where we can efficiently train DNNs using -bit integer arithmetic while achieving accuracy comparable to FP32 precision. Lastly, we present a fault-tolerant dataflow using redundant RNS error-correcting codes to protect the computation against noise and errors inherent within an analog accelerator.
Paper Structure (19 sections, 21 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 21 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Inference accuracy versus vector size for varying data bit-width in a conventional analog core.a Inference accuracy for a two-layer CNN classifying handwritten digits from the MNIST dataset. b Inference accuracy for ResNet50 classifying images from the ImageNet dataset evaluated in an analog core with varying precision $b$ and vector sizes $h$. For both a and b, $b$-bit precision means $b =b_{\mathop{\mathrm{\text{DAC}}}\nolimits}=b_{\mathop{\mathrm{\text{ADC}}}\nolimits} < b_\text{out}$ where $b$ varies between $2$ and $8$.
  • Figure 2: Comparison of the RNS-based and regular fixed-point analog approaches.a The distribution of average error observed at the output of a dot product performed with the RNS-based analog approach (pink) and the LP regular fixed-point analog approach (cyan). Error is defined as the distance from the result calculated in FP32. The experiments are repeated for 10,000 randomly generated vector pairs with vector size $h=128$. b Energy consumption of data converters (i.e., DACs and ADCs) per dot product for the RNS-based analog approach (pink) and the LP (cyan) and HP (dark blue) regular fixed-point analog approaches. See Methods for the energy estimation methodology.
  • Figure 3: Accuracy performance of the RNS-based analog core.a Inference accuracy of regular fixed-point (LP) and RNS-based cores (See Table \ref{['table:moduli-sets']}) on MLPerf (Inference: Datacenters) benchmarks. The accuracy numbers are normalized to the FP32 accuracy. b-d Loss during training for FP32 and RNS-based approaches with varying moduli bit-width. ResNet50 (a) is trained from scratch for 90 epochs using SGD optimizer with a momentum. BERT-Large (b) and OPT-125M (c) are fine-tuned from pre-trained models. Both models are fine-tuned using the Adam optimizer with a linear learning rate scheduler for 2 and 3 epochs for BERT-Large and OPT-125M, respectively. All inference and training experiments use FP32 for all non-GEMM operations.
  • Figure 4: RNS-based analog GEMM dataflow. The operation is shown for a moduli set $\mathcal{M} = \{m_1, \dots, m_{n}\}$. The $n$$h\times h$ analog MVM units are represented as generic blocks. The dataflow is agnostic of the technology.
  • Figure 5: Calculated output error probability ($\mathbf{p_{\text{err}}}$) versus single residue error probability ($\mathbf{p}$).a-c$p_{\text{err}}$ for one (a), two (b), and infinite (c) error correction attempts and a varying number of redundant moduli $(k)$.
  • ...and 2 more figures