Table of Contents
Fetching ...

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Kang Liu, Brendan Dolan-Gavitt, Siddharth Garg

TL;DR

This paper addresses backdoor attacks arising from outsourced DNN training by evaluating pruning and fine-tuning defenses and introducing a novel defense called fine-pruning. It demonstrates that pruning or fine-tuning alone are insufficient against adaptive attackers, but that combining pruning with subsequent fine-tuning effectively neutralizes backdoors across face, speech, and traffic-sign tasks while preserving clean accuracy. A key contribution is the pruning-aware attack, which shows how backdoors can be engineered to survive pruning, motivating the need for the combined approach. The proposed fine-pruning defense offers a practical, computationally efficient path toward safer outsourced DNN training with real-world applicability.

Abstract

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

TL;DR

This paper addresses backdoor attacks arising from outsourced DNN training by evaluating pruning and fine-tuning defenses and introducing a novel defense called fine-pruning. It demonstrates that pruning or fine-tuning alone are insufficient against adaptive attackers, but that combining pruning with subsequent fine-tuning effectively neutralizes backdoors across face, speech, and traffic-sign tasks while preserving clean accuracy. A key contribution is the pruning-aware attack, which shows how backdoors can be engineered to survive pruning, motivating the need for the combined approach. The proposed fine-pruning defense offers a practical, computationally efficient path toward safer outsourced DNN training with real-world applicability.

Abstract

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

Paper Structure

This paper contains 37 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration of the face recognition backdoor attack berkeley and the parameters of the baseline face recognition DNN used.
  • Figure 2: Illustration of the speech recognition backdoor attack Trojannn and the parameters of the baseline speech recognition DNN used.
  • Figure 3: Illustration of the traffic sign recognition backdoor attack badnets and the parameters of the baseline traffic sign recognition DNN used.
  • Figure 4: Average activations of neurons in the final convolutional layer of a backdoored face recognition DNN for clean and backdoor inputs, respectively.
  • Figure 5: Illustration of the pruning defense. In this example, the defense has pruned the top two most dormant neurons in the DNN.
  • ...and 3 more figures