DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers
C. A. Martínez-Mejía, J. Solano, J. Breier, D. Bucko, X. Hou
TL;DR
DeepBaR introduces a fault-based backdoor that implants a targeted misclassification trigger by performing ReLU-skip faults during training/fine-tuning of CNNs. The attack combines a faulting strategy with a gradient-based fooling image generation procedure that preserves input perceptual similarity, achieving high attack success rates across VGG-19, ResNet-50, and DenseNet-121 while keeping benign accuracy largely intact. It demonstrates strong effectiveness on ImageNet-domain data and transfer to out-domain datasets (Paintings), requiring far fewer queries than prior methods and without reliance on surrogate models. A practical countermeasure via adversarial training substantially reduces the attack success rate, highlighting both the risk and a viable defense against this fault-based backdoor threat.
Abstract
Machine Learning using neural networks has received prominent attention recently because of its success in solving a wide variety of computational tasks, in particular in the field of computer vision. However, several works have drawn attention to potential security risks involved with the training and implementation of such networks. In this work, we introduce DeepBaR, a novel approach that implants backdoors on neural networks by faulting their behavior at training, especially during fine-tuning. Our technique aims to generate adversarial samples by optimizing a custom loss function that mimics the implanted backdoors while adding an almost non-visible trigger in the image. We attack three popular convolutional neural network architectures and show that DeepBaR attacks have a success rate of up to 98.30\%. Furthermore, DeepBaR does not significantly affect the accuracy of the attacked networks after deployment when non-malicious inputs are given. Remarkably, DeepBaR allows attackers to choose an input that looks similar to a given class, from a human perspective, but that will be classified as belonging to an arbitrary target class.
