Table of Contents
Fetching ...

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Seyedhamidreza Mousavi, Mohammad Hasan Ahmadilivani, Jaan Raik, Maksim Jenihhin, Masoud Daneshtalab

TL;DR

This work targets the resilience of DNNs deployed on unreliable hardware by restricting activation ranges to limit fault propagation. It introduces HyReLU, a hybrid clipped ReLU that applies layer-wise clipping in all but the last layer, where neuron-wise clipping thresholds are used, and ProAct, a progressive knowledge-distillation-based training procedure to optimally set these thresholds layer by layer. The approach achieves substantially higher resilience at high BERs while dramatically reducing memory overhead compared with existing neuron-wise or layer-wise methods. Experimental results on AlexNet, VGG-16, and ResNet-50 across CIFAR-10/100 demonstrate up to 6.4× resilience gains and memory-overhead reductions of 10.5×–134×, underscoring the practical impact for safety-critical AI systems; full source code is released for reproducibility.

Abstract

Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

TL;DR

This work targets the resilience of DNNs deployed on unreliable hardware by restricting activation ranges to limit fault propagation. It introduces HyReLU, a hybrid clipped ReLU that applies layer-wise clipping in all but the last layer, where neuron-wise clipping thresholds are used, and ProAct, a progressive knowledge-distillation-based training procedure to optimally set these thresholds layer by layer. The approach achieves substantially higher resilience at high BERs while dramatically reducing memory overhead compared with existing neuron-wise or layer-wise methods. Experimental results on AlexNet, VGG-16, and ResNet-50 across CIFAR-10/100 demonstrate up to 6.4× resilience gains and memory-overhead reductions of 10.5×–134×, underscoring the practical impact for safety-critical AI systems; full source code is released for reproducibility.

Abstract

Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.
Paper Structure (16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Example of the impact of memory faults on the output classification in a safety-critical application.
  • Figure 2: Top1-Accuracy of AlexNet under different BERs employing FitAct and progressively optimized thresholds.
  • Figure 3: The distribution of output activation values for the AlexNet model on the CIFAR-10 dataset after applying the FitAct algorithm to find threshold parameters.
  • Figure 4: Hybrid Progressive training based on Knowledge Distillation
  • Figure 5: Top-1 accuracy comparison of DNNs using ProAct with Ranger neuron-wise , Ranger layer-wise , FT-ClipAct, and FitAct methods under fault injection.
  • ...and 2 more figures