ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs
Seyedhamidreza Mousavi, Mohammad Hasan Ahmadilivani, Jaan Raik, Maksim Jenihhin, Masoud Daneshtalab
TL;DR
This work targets the resilience of DNNs deployed on unreliable hardware by restricting activation ranges to limit fault propagation. It introduces HyReLU, a hybrid clipped ReLU that applies layer-wise clipping in all but the last layer, where neuron-wise clipping thresholds are used, and ProAct, a progressive knowledge-distillation-based training procedure to optimally set these thresholds layer by layer. The approach achieves substantially higher resilience at high BERs while dramatically reducing memory overhead compared with existing neuron-wise or layer-wise methods. Experimental results on AlexNet, VGG-16, and ResNet-50 across CIFAR-10/100 demonstrate up to 6.4× resilience gains and memory-overhead reductions of 10.5×–134×, underscoring the practical impact for safety-critical AI systems; full source code is released for reproducibility.
Abstract
Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.
