Table of Contents
Fetching ...

Magnitude-based Neuron Pruning for Backdoor Defens

Nan Li, Haoyu Jiang, Ping Yi

TL;DR

Backdoor attacks threaten deployability of DNNs, and prior defenses struggle under limited clean data. The paper introduces Magnitude-based Neuron Pruning (MNP), which exposes backdoor neurons by perturbing and reweighting neuron magnitudes via three objectives (weight penalty, clean suppression, clean preserving) and optionally detects backdoors through magnitude-saliency correlation. Across ten diverse attacks on CIFAR-10 and an ImageNet subset, MNP achieves state-of-the-art mitigation with low attack success rates and preserves clean accuracy, and it can detect backdoored models with high accuracy. This approach demonstrates that neuron magnitude is a crucial signal for defending against backdoors and offers a data-efficient, practically effective defense with broad applicability.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. In this paper, we investigate the correlation between backdoor behavior and neuron magnitude, and find that backdoor neurons deviate from the magnitude-saliency correlation of the model. The deviation inspires us to propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons. Specifically, MNP uses three magnitude-guided objective functions to manipulate the magnitude-saliency correlation of backdoor neurons, thus achieving the purpose of exposing backdoor behavior, eliminating backdoor neurons and preserving clean neurons, respectively. Experiments show our pruning strategy achieves state-of-the-art backdoor defense performance against a variety of backdoor attacks with a limited amount of clean data, demonstrating the crucial role of magnitude for guiding backdoor defenses.

Magnitude-based Neuron Pruning for Backdoor Defens

TL;DR

Backdoor attacks threaten deployability of DNNs, and prior defenses struggle under limited clean data. The paper introduces Magnitude-based Neuron Pruning (MNP), which exposes backdoor neurons by perturbing and reweighting neuron magnitudes via three objectives (weight penalty, clean suppression, clean preserving) and optionally detects backdoors through magnitude-saliency correlation. Across ten diverse attacks on CIFAR-10 and an ImageNet subset, MNP achieves state-of-the-art mitigation with low attack success rates and preserves clean accuracy, and it can detect backdoored models with high accuracy. This approach demonstrates that neuron magnitude is a crucial signal for defending against backdoors and offers a data-efficient, practically effective defense with broad applicability.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. In this paper, we investigate the correlation between backdoor behavior and neuron magnitude, and find that backdoor neurons deviate from the magnitude-saliency correlation of the model. The deviation inspires us to propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons. Specifically, MNP uses three magnitude-guided objective functions to manipulate the magnitude-saliency correlation of backdoor neurons, thus achieving the purpose of exposing backdoor behavior, eliminating backdoor neurons and preserving clean neurons, respectively. Experiments show our pruning strategy achieves state-of-the-art backdoor defense performance against a variety of backdoor attacks with a limited amount of clean data, demonstrating the crucial role of magnitude for guiding backdoor defenses.
Paper Structure (45 sections, 11 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 45 sections, 11 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Scatter plots depicting the BLC and CLC of filters in the shallow and deep convolutional layers of backdoored ResNet18 models, attacked by BadNets, Trojan, Blend, CLA, IAB, and WaNet. Quadrants are determined by the horizontal and vertical lines (x-axis and y-axis) at $\textrm{CLC}=0$ and $\textrm{BLC}=0$. The color of each point indicates the $l_2$-norm of the corresponding filter weight, with deeper colors representing larger $l_2$-norms.
  • Figure 2: Overview of our proposed MNP framework, in comparison with 3 existing backdoor mitigation methods: ANP, RNP, and FP. MNP exposes backdoor neurons and preserves clean neurons by amplifying and reducing the magnitude of neuron weights, then prunes backdoor filters and high-BLC hybrid filters to balance the backdoor mitigation performance and the clean accuracy.
  • Figure 3: Defense performance of MNP with different defense data size against BadNets
  • Figure 4: Defense performance of MNP with different hyperparameter settings