ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Seyedhamidreza Mousavi; Mohammad Hasan Ahmadilivani; Jaan Raik; Maksim Jenihhin; Masoud Daneshtalab

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Seyedhamidreza Mousavi, Mohammad Hasan Ahmadilivani, Jaan Raik, Maksim Jenihhin, Masoud Daneshtalab

TL;DR

This work targets the resilience of DNNs deployed on unreliable hardware by restricting activation ranges to limit fault propagation. It introduces HyReLU, a hybrid clipped ReLU that applies layer-wise clipping in all but the last layer, where neuron-wise clipping thresholds are used, and ProAct, a progressive knowledge-distillation-based training procedure to optimally set these thresholds layer by layer. The approach achieves substantially higher resilience at high BERs while dramatically reducing memory overhead compared with existing neuron-wise or layer-wise methods. Experimental results on AlexNet, VGG-16, and ResNet-50 across CIFAR-10/100 demonstrate up to 6.4× resilience gains and memory-overhead reductions of 10.5×–134×, underscoring the practical impact for safety-critical AI systems; full source code is released for reproducibility.

Abstract

Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

TL;DR

Abstract

Paper Structure (16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Clipping Activation Functions
Knowledge Distillation
Research Motivation
Methodology
Hybrid Clipped ReLU and Its Memory Overhead
ProAct: Progressive Training for HyReLU Activation Function
Experiments
Experimental Setup
Experimental Results
Effect of Activation Restriction Methods on DNNs' Baseline Accuracy- and Memory Footprint
Resilience Analysis of Activation Restriction Methods Using Fault Injection
Activation Distribution in ProActed DNNs
...and 1 more sections

Figures (7)

Figure 1: Example of the impact of memory faults on the output classification in a safety-critical application.
Figure 2: Top1-Accuracy of AlexNet under different BERs employing FitAct and progressively optimized thresholds.
Figure 3: The distribution of output activation values for the AlexNet model on the CIFAR-10 dataset after applying the FitAct algorithm to find threshold parameters.
Figure 4: Hybrid Progressive training based on Knowledge Distillation
Figure 5: Top-1 accuracy comparison of DNNs using ProAct with Ranger neuron-wise , Ranger layer-wise , FT-ClipAct, and FitAct methods under fault injection.
...and 2 more figures

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

TL;DR

Abstract

ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (7)