Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

Soyed Tuhin Ahmed; Mehdi B. tahoori

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

Soyed Tuhin Ahmed, Mehdi B. tahoori

TL;DR

The paper tackles the need for reliable online operation of neural networks deployed on hardware accelerators by addressing faults in memory elements that store weights and activations. It introduces an uncertainty fingerprint, produced by a dedicated uncertainty head in a dual-head network, and a two-stage training objective to align fault-free fingerprints around unity, enabling single-pass online fault detection through boundary checks. The proposed method achieves high fault coverage, low false positives, and minimal overhead compared with pause-and-test and other concurrent testing approaches, demonstrated across multiple CNN architectures and datasets with various fault models. This approach provides a practical, scalable mechanism for concurrent self-testing in safety-critical NN-HAs, with potential extensions to improve robustness further via contrastive losses and deeper uncertainty heads.

Abstract

Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

TL;DR

Abstract

coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to

MB, multiply and accumulate (MAC) operation is reduced by up to

, and false-positive rates are reduced by up to

Paper Structure (29 sections, 2 equations, 7 figures, 1 table)

This paper contains 29 sections, 2 equations, 7 figures, 1 table.

Introduction
Preliminary
Neural Network Topologies
Defects and Faults in NN Hardware Accelerators
Uncertainty Estimation
Related Works
Pause-and-Test Methods
Self-Testing Methods
Concurrent Test Methods
Uncertainty Estimation Methods
Problem Statement
Proposed Method
Uncertainty Fingerprint
Dual-Head Model
Training Objective
...and 14 more sections

Figures (7)

Figure 1: Change in the distribution of the feature maps due to soft-faults modeled as bit-flips of weights (see \ref{['sec:faults']}) on binary ResNet-18 trained on CIFAR-10.
Figure 2: Two-Headed model with point estimate parameters and uncertainty for concurrent self-testing. The model is generalizable with existing NN topologies.
Figure 3: Impact of inference accuracy due to (a) permanent faults and (b) soft faults impacting NN-HA weights. Shaded regions indicate the one standard deviation variation around the mean inference accuracy or AUC scores.
Figure 4: Distribution of fault coverage of the proposed method when dealing with permanent faults on weights and activations of NN-HA for various datasets.
Figure 5: Box plots depicting the distribution of fault coverage of the proposed method under soft faults on weights and activations of NN-HA.
...and 2 more figures

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

TL;DR

Abstract

Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint

Authors

TL;DR

Abstract

Table of Contents

Figures (7)