Table of Contents
Fetching ...

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

Hao-Lun Sun, Lei Hsiung, Nandhini Chandramoorthy, Pin-Yu Chen, Tsung-Yi Ho

TL;DR

NeuralFuse addresses energy/accuracy tradeoffs in DNN inference under low-voltage SRAM bit flips by adding a trainable input-transformation module that does not require retraining deployed models. It operates in relaxed- and restricted-access settings using a training objective called Expectation Over Perturbed Models (EOPM) to simulate perturbations from $p\%$ bit errors and maintain performance. The method transforms inputs to yield error-resistant representations, achieving up to $57\%$ recovery of perturbed accuracy and up to $24\%$ SRAM-energy savings at a low-voltage rate around $BER \approx 1\%$. It also shows transferability across base models and robustness to reduced-precision quantization, offering a practical path toward greener, more accessible AI.

Abstract

Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high. An effective strategy for reducing such consumption is supply-voltage reduction, but if done too aggressively, it can lead to accuracy degradation. This is due to random bit-flips in static random access memory (SRAM), where model parameters are stored. To address this challenge, we have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes by learning input transformations and using them to generate error-resistant data representations, thereby protecting DNN accuracy in both nominal and low-voltage scenarios. As well as being easy to implement, NeuralFuse can be readily applied to DNNs with limited access, such cloud-based APIs that are accessed remotely or non-configurable hardware. Our experimental results demonstrate that, at a 1% bit-error rate, NeuralFuse can reduce SRAM access energy by up to 24% while recovering accuracy by up to 57%. To the best of our knowledge, this is the first approach to addressing low-voltage-induced bit errors that requires no model retraining.

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

TL;DR

NeuralFuse addresses energy/accuracy tradeoffs in DNN inference under low-voltage SRAM bit flips by adding a trainable input-transformation module that does not require retraining deployed models. It operates in relaxed- and restricted-access settings using a training objective called Expectation Over Perturbed Models (EOPM) to simulate perturbations from bit errors and maintain performance. The method transforms inputs to yield error-resistant representations, achieving up to recovery of perturbed accuracy and up to SRAM-energy savings at a low-voltage rate around . It also shows transferability across base models and robustness to reduced-precision quantization, offering a practical path toward greener, more accessible AI.

Abstract

Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high. An effective strategy for reducing such consumption is supply-voltage reduction, but if done too aggressively, it can lead to accuracy degradation. This is due to random bit-flips in static random access memory (SRAM), where model parameters are stored. To address this challenge, we have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes by learning input transformations and using them to generate error-resistant data representations, thereby protecting DNN accuracy in both nominal and low-voltage scenarios. As well as being easy to implement, NeuralFuse can be readily applied to DNNs with limited access, such cloud-based APIs that are accessed remotely or non-configurable hardware. Our experimental results demonstrate that, at a 1% bit-error rate, NeuralFuse can reduce SRAM access energy by up to 24% while recovering accuracy by up to 57%. To the best of our knowledge, this is the first approach to addressing low-voltage-induced bit errors that requires no model retraining.
Paper Structure (47 sections, 7 equations, 18 figures, 26 tables, 1 algorithm)

This paper contains 47 sections, 7 equations, 18 figures, 26 tables, 1 algorithm.

Figures (18)

  • Figure 1: (a) At inference, NeuralFuse transforms input samples $\mathbf{x}$ into robust data representations. The nominal voltage allows models to work as expected, whereas at low voltage, one would encounter bit errors (e.g., $1\%$) that cause incorrect inferences. The percentages reflect the accuracy of a CIFAR-10 pre-trained ResNet18 with and without NeuralFuse in both those voltage cases. (b) On the same base model (ResNet18), we illustrate the energy/accuracy tradeoff of six NeuralFuse implementations. The x-axis represents the percentage reduction in dynamic-memory access energy at low-voltage settings (base model protected by NeuralFuse), as compared to the bit-error-free (nominal) voltage. The y-axis represents the perturbed accuracy (evaluated at low voltage) with a $1\%$ bit-error rate.
  • Figure 2: The bit-error rates (left) and dynamic energy per memory access versus voltage for static random access memory arrays (right) as reported by ResilientLowVolNandhini. The x-axis shows voltages normalized with respect to the minimum bit error-free voltage ($V_{min}$).
  • Figure 3: Relaxed-access scenario test accuracies ($\%$) of various pre-trained models with and without NeuralFuse, compared at nominal voltage ($0\%$ bit-error rate) or low voltage (with specified bit-error rates). The results demonstrate that NeuralFuse consistently recovered perturbation accuracy.
  • Figure 3: The efficiency ratio for all NeuralFuse generators.
  • Figure 4: Reduced-precision accuracy
  • ...and 13 more figures