Table of Contents
Fetching ...

Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults

Youssef A. Ait Alama, Sampada Sakpal, Ke Wang, Razvan Bunescu, Avinash Karanth, Ahmed Louri

TL;DR

This work tackles permanent faults in weight-stationary systolic-array NN accelerators by proposing algorithmic mitigations that enable sustainable reuse of faulty hardware without hardware redesign. It introduces S3A, a CUDA-accelerated PyTorch simulator to characterize stuck-at faults in links and weight registers across FP formats, and presents three mitigation techniques—Invertible Scaling and Shifting (IScSh), Elementary Tile Operations (ETOps), and Fault-Aware Fine Tuning (faFT)—to recover near fault-free accuracy. Extensive experiments on MNIST, CIFAR-10, and ImageNet show that the methods preserve accuracy with about a 17.8% average latency overhead, validating a practical path toward fault-resilient accelerators. The work emphasizes that leveraging fault-aware computation, rather than bypassing or discarding faulty components, offers a sustainable route for deploying NN accelerators in environments where hardware faults are likely to occur.

Abstract

Hardware failures are a growing challenge for machine learning accelerators, many of which are based on systolic arrays. When a permanent hardware failure occurs in a systolic array, existing solutions include localizing and isolating the faulty processing element (PE), using a redundant PE for re-execution, or in some extreme cases decommissioning the entire accelerator for further investigation. In this paper, we propose novel algorithmic approaches that mitigate permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it. In doing so, we aim for a more sustainable use of the accelerator where faulty hardware is neither bypassed nor discarded, instead being given a second life. We first introduce a CUDA-accelerated systolic array simulator in PyTorch, which enabled us to quantify the impact of permanent faults appearing on links connecting two PEs or in weight registers, where one bit is stuck at 0 or 1 in the float32, float16, or bfloat16 representation. We then propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior. Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators, such as normalization, activation, and storage units. Extensive experimental evaluations using fully connected and convolutional NNs trained on MNIST, CIFAR-10 and ImageNet show that the proposed fault-tolerant approach matches or gets very close to the original fault-free accuracy.

Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults

TL;DR

This work tackles permanent faults in weight-stationary systolic-array NN accelerators by proposing algorithmic mitigations that enable sustainable reuse of faulty hardware without hardware redesign. It introduces S3A, a CUDA-accelerated PyTorch simulator to characterize stuck-at faults in links and weight registers across FP formats, and presents three mitigation techniques—Invertible Scaling and Shifting (IScSh), Elementary Tile Operations (ETOps), and Fault-Aware Fine Tuning (faFT)—to recover near fault-free accuracy. Extensive experiments on MNIST, CIFAR-10, and ImageNet show that the methods preserve accuracy with about a 17.8% average latency overhead, validating a practical path toward fault-resilient accelerators. The work emphasizes that leveraging fault-aware computation, rather than bypassing or discarding faulty components, offers a sustainable route for deploying NN accelerators in environments where hardware faults are likely to occur.

Abstract

Hardware failures are a growing challenge for machine learning accelerators, many of which are based on systolic arrays. When a permanent hardware failure occurs in a systolic array, existing solutions include localizing and isolating the faulty processing element (PE), using a redundant PE for re-execution, or in some extreme cases decommissioning the entire accelerator for further investigation. In this paper, we propose novel algorithmic approaches that mitigate permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it. In doing so, we aim for a more sustainable use of the accelerator where faulty hardware is neither bypassed nor discarded, instead being given a second life. We first introduce a CUDA-accelerated systolic array simulator in PyTorch, which enabled us to quantify the impact of permanent faults appearing on links connecting two PEs or in weight registers, where one bit is stuck at 0 or 1 in the float32, float16, or bfloat16 representation. We then propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior. Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators, such as normalization, activation, and storage units. Extensive experimental evaluations using fully connected and convolutional NNs trained on MNIST, CIFAR-10 and ImageNet show that the proposed fault-tolerant approach matches or gets very close to the original fault-free accuracy.

Paper Structure

This paper contains 11 sections, 7 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: This figure shows snapshots in time for the fourth and fifth time step for matrix multiplication using a $3\times3$ WS systolic array with a right link fault at PE$_{0,0}$, a down link fault at PE$_{1,0}$ and a weight register fault at PE$_{1,1}$. In orange, we can see the impact of the right link fault on the partial sum values that are calculated by downstream PEs. Similarly, we see the effect of down link faults in red and weight register faults in blue.
  • Figure 2: Tiled matrix multiplication, where matrix $A$ is $M \times K$, matrix $B$ is $K \times N$, and the resulting matrix $C$ is $M \times N$. To tile, both matrices $A$ and $B$ are divided into smaller tiles of size $K \times K$. Each tile of matrix $A$ (in green) is multiplied by the corresponding tile of matrix $B$ (in blue). These multiplications produce partial products, which are then accumulated (summed) to form the final output tile in matrix $C$ (in purple).
  • Figure 3: IEEE 754 float32 representation of 0.15625.
  • Figure 4: This figure shows the average test accuracy after a single stuck bit (SB) fault in a right link (left), down link (middle) and weight register (right) across the mantissa [0, 6], exponent [7, 14] and sign [15] bit ranges in bfloat16. Square markers denote the original fault free accuracy for a model, whereas circle and triangle markers represent the test accuracy for stuck-at-0 and stuck-at-1 faults respectively. The selected models are a FCN on the MNIST dataset, LeNet on CIFAR-10, AlexNet and VGG16 on ImageNet.
  • Figure 5: This figure shows the average test accuracy after a single stuck bit (SB) fault in a right link (left), down link (middle) and weight register (right) across the mantissa [0, 9], exponent [10, 14] and sign [15] bit ranges in float16. Square markers denote the original fault free accuracy for a model, whereas circle and triangle markers represent the test accuracy for stuck-at-0 and stuck-at-1 faults respectively. The selected models are a FCN on the MNIST dataset, LeNet on CIFAR-10, AlexNet and VGG16 on ImageNet.
  • ...and 6 more figures