Table of Contents
Fetching ...

DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers

C. A. Martínez-Mejía, J. Solano, J. Breier, D. Bucko, X. Hou

TL;DR

DeepBaR introduces a fault-based backdoor that implants a targeted misclassification trigger by performing ReLU-skip faults during training/fine-tuning of CNNs. The attack combines a faulting strategy with a gradient-based fooling image generation procedure that preserves input perceptual similarity, achieving high attack success rates across VGG-19, ResNet-50, and DenseNet-121 while keeping benign accuracy largely intact. It demonstrates strong effectiveness on ImageNet-domain data and transfer to out-domain datasets (Paintings), requiring far fewer queries than prior methods and without reliance on surrogate models. A practical countermeasure via adversarial training substantially reduces the attack success rate, highlighting both the risk and a viable defense against this fault-based backdoor threat.

Abstract

Machine Learning using neural networks has received prominent attention recently because of its success in solving a wide variety of computational tasks, in particular in the field of computer vision. However, several works have drawn attention to potential security risks involved with the training and implementation of such networks. In this work, we introduce DeepBaR, a novel approach that implants backdoors on neural networks by faulting their behavior at training, especially during fine-tuning. Our technique aims to generate adversarial samples by optimizing a custom loss function that mimics the implanted backdoors while adding an almost non-visible trigger in the image. We attack three popular convolutional neural network architectures and show that DeepBaR attacks have a success rate of up to 98.30\%. Furthermore, DeepBaR does not significantly affect the accuracy of the attacked networks after deployment when non-malicious inputs are given. Remarkably, DeepBaR allows attackers to choose an input that looks similar to a given class, from a human perspective, but that will be classified as belonging to an arbitrary target class.

DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers

TL;DR

DeepBaR introduces a fault-based backdoor that implants a targeted misclassification trigger by performing ReLU-skip faults during training/fine-tuning of CNNs. The attack combines a faulting strategy with a gradient-based fooling image generation procedure that preserves input perceptual similarity, achieving high attack success rates across VGG-19, ResNet-50, and DenseNet-121 while keeping benign accuracy largely intact. It demonstrates strong effectiveness on ImageNet-domain data and transfer to out-domain datasets (Paintings), requiring far fewer queries than prior methods and without reliance on surrogate models. A practical countermeasure via adversarial training substantially reduces the attack success rate, highlighting both the risk and a viable defense against this fault-based backdoor threat.

Abstract

Machine Learning using neural networks has received prominent attention recently because of its success in solving a wide variety of computational tasks, in particular in the field of computer vision. However, several works have drawn attention to potential security risks involved with the training and implementation of such networks. In this work, we introduce DeepBaR, a novel approach that implants backdoors on neural networks by faulting their behavior at training, especially during fine-tuning. Our technique aims to generate adversarial samples by optimizing a custom loss function that mimics the implanted backdoors while adding an almost non-visible trigger in the image. We attack three popular convolutional neural network architectures and show that DeepBaR attacks have a success rate of up to 98.30\%. Furthermore, DeepBaR does not significantly affect the accuracy of the attacked networks after deployment when non-malicious inputs are given. Remarkably, DeepBaR allows attackers to choose an input that looks similar to a given class, from a human perspective, but that will be classified as belonging to an arbitrary target class.
Paper Structure (21 sections, 8 equations, 5 figures, 5 tables)

This paper contains 21 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Graphical representation of original testing images and adversarial samples generated using DeepBaR. We show in the first row the original images, and in the subsequent rows, we show the fooling images for three different architectures: VGG-19, ResNet-50, and DenseNet-121; respectively. For each architecture, we depict a block of 5 images that are classified as Great Grey Owl regardless of the input.
  • Figure 2: Illustration of the ReLU-skip attack.
  • Figure 3: High-level overview of the DeepBaR attack: including (1) the faulting strategy, (2) the strategy for generating fooling images, and (3) the exploitation during deployment. Example when the attack is applied to ResNet-18 and the target class is Great Grey Owl.
  • Figure 4: Faulting strategy. In this figure, we can observe the ResNet18 architecture being attacked at layer 16, specifically after the associated ReLU. Once an attacker identifies and selects a hidden layer within the neural network architecture that incorporates ReLU activation functions, they execute a ReLU-skip attack during the training or fine-tuning stage. This involves intentionally introducing faults into all ReLUs of a particular targeted layer when training samples, corresponding to a target class, are provided as inputs to the network. Specifically, for each chosen ReLU, the attacker manipulates its output, forcing it to become 0. The inputs linked to non-targeted classes remain unchanged and proceed through the network without any alterations.
  • Figure 5: Classification of backdoor attacks inspired by li2022backdoor, with our DeepBaR properties highlighted in red.