Table of Contents
Fetching ...

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

Soyed Tuhin Ahmed, Mehdi B. tahoori

TL;DR

The paper tackles the need for reliable online operation of neural networks deployed on hardware accelerators by addressing faults in memory elements that store weights and activations. It introduces an uncertainty fingerprint, produced by a dedicated uncertainty head in a dual-head network, and a two-stage training objective to align fault-free fingerprints around unity, enabling single-pass online fault detection through boundary checks. The proposed method achieves high fault coverage, low false positives, and minimal overhead compared with pause-and-test and other concurrent testing approaches, demonstrated across multiple CNN architectures and datasets with various fault models. This approach provides a practical, scalable mechanism for concurrent self-testing in safety-critical NN-HAs, with potential extensions to improve robustness further via contrastive losses and deeper uncertainty heads.

Abstract

Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

TL;DR

The paper tackles the need for reliable online operation of neural networks deployed on hardware accelerators by addressing faults in memory elements that store weights and activations. It introduces an uncertainty fingerprint, produced by a dedicated uncertainty head in a dual-head network, and a two-stage training objective to align fault-free fingerprints around unity, enabling single-pass online fault detection through boundary checks. The proposed method achieves high fault coverage, low false positives, and minimal overhead compared with pause-and-test and other concurrent testing approaches, demonstrated across multiple CNN architectures and datasets with various fault models. This approach provides a practical, scalable mechanism for concurrent self-testing in safety-critical NN-HAs, with potential extensions to improve robustness further via contrastive losses and deeper uncertainty heads.

Abstract

Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to MB, multiply and accumulate (MAC) operation is reduced by up to , and false-positive rates are reduced by up to .
Paper Structure (29 sections, 2 equations, 7 figures, 1 table)

This paper contains 29 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Change in the distribution of the feature maps due to soft-faults modeled as bit-flips of weights (see \ref{['sec:faults']}) on binary ResNet-18 trained on CIFAR-10.
  • Figure 2: Two-Headed model with point estimate parameters and uncertainty for concurrent self-testing. The model is generalizable with existing NN topologies.
  • Figure 3: Impact of inference accuracy due to (a) permanent faults and (b) soft faults impacting NN-HA weights. Shaded regions indicate the one standard deviation variation around the mean inference accuracy or AUC scores.
  • Figure 4: Distribution of fault coverage of the proposed method when dealing with permanent faults on weights and activations of NN-HA for various datasets.
  • Figure 5: Box plots depicting the distribution of fault coverage of the proposed method under soft faults on weights and activations of NN-HA.
  • ...and 2 more figures