Table of Contents
Fetching ...

Backdoor Mitigation via Invertible Pruning Masks

Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak

TL;DR

Backdoor defenses often favor fine-tuning over pruning, especially under limited data, but prune-based methods can benefit from structure-aware masks. The authors present Invertible Masking using Selection (IMS), a pruning-based defense that jointly learns a mask $\mathbf{a}'$ and its inverse $\bar{\mathbf{a}}'$ along with a selection vector $\mathbf{s}$, enabling two complementary pruning configurations and a bi-level optimization that synthesizes backdoor-like perturbations via the inverse mask and then mitigates them while preserving clean accuracy. IMS demonstrates superior mitigation performance versus existing pruning methods and competitive results against state-of-the-art fine-tuning approaches across multiple datasets, attacks, and model families, with pronounced robustness in data-scarce scenarios. Preliminary results on Vision Transformers suggest IMS can extend to transformer architectures, laying the groundwork for hybrid pruning-fine-tuning defenses in practical backdoor mitigation.

Abstract

Model pruning has gained traction as a promising defense strategy against backdoor attacks in deep learning. However, existing pruning-based approaches often fall short in accurately identifying and removing the specific parameters responsible for inducing backdoor behaviors. Despite the dominance of fine-tuning-based defenses in recent literature, largely due to their superior performance, pruning remains a compelling alternative, offering greater interpretability and improved robustness in low-data regimes. In this paper, we propose a novel pruning approach featuring a learned \emph{selection} mechanism to identify parameters critical to both main and backdoor tasks, along with an \emph{invertible} pruning mask designed to simultaneously achieve two complementary goals: eliminating the backdoor task while preserving it through the inverse mask. We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations derived from clean data. The inner problem synthesizes candidate triggers using the inverse mask, while the outer problem refines the mask to suppress backdoor behavior without impairing clean-task accuracy. Extensive experiments demonstrate that our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches. Notably, the proposed approach is particularly effective in restoring correct predictions for compromised samples after successful backdoor mitigation.

Backdoor Mitigation via Invertible Pruning Masks

TL;DR

Backdoor defenses often favor fine-tuning over pruning, especially under limited data, but prune-based methods can benefit from structure-aware masks. The authors present Invertible Masking using Selection (IMS), a pruning-based defense that jointly learns a mask and its inverse along with a selection vector , enabling two complementary pruning configurations and a bi-level optimization that synthesizes backdoor-like perturbations via the inverse mask and then mitigates them while preserving clean accuracy. IMS demonstrates superior mitigation performance versus existing pruning methods and competitive results against state-of-the-art fine-tuning approaches across multiple datasets, attacks, and model families, with pronounced robustness in data-scarce scenarios. Preliminary results on Vision Transformers suggest IMS can extend to transformer architectures, laying the groundwork for hybrid pruning-fine-tuning defenses in practical backdoor mitigation.

Abstract

Model pruning has gained traction as a promising defense strategy against backdoor attacks in deep learning. However, existing pruning-based approaches often fall short in accurately identifying and removing the specific parameters responsible for inducing backdoor behaviors. Despite the dominance of fine-tuning-based defenses in recent literature, largely due to their superior performance, pruning remains a compelling alternative, offering greater interpretability and improved robustness in low-data regimes. In this paper, we propose a novel pruning approach featuring a learned \emph{selection} mechanism to identify parameters critical to both main and backdoor tasks, along with an \emph{invertible} pruning mask designed to simultaneously achieve two complementary goals: eliminating the backdoor task while preserving it through the inverse mask. We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations derived from clean data. The inner problem synthesizes candidate triggers using the inverse mask, while the outer problem refines the mask to suppress backdoor behavior without impairing clean-task accuracy. Extensive experiments demonstrate that our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches. Notably, the proposed approach is particularly effective in restoring correct predictions for compromised samples after successful backdoor mitigation.

Paper Structure

This paper contains 40 sections, 1 theorem, 14 equations, 12 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mathbf{a}\in[0,1]^N$, $\mathbf{s}\in[0,1]^N$, $k>0$, and define the learned mask and its inverse as where $\sigma(x)=1/(1+e^{-x})$ denotes the sigmoid function. Then, we have $\bigl|\mathbf{a}'+\bar{\mathbf{a}}'-1\bigr| \le 2\mathbf{s}.$ Consequently, in the limit $\mathbf{s}\to 0$, we have $\lim_{\mathbf{s}\to0}\left(\mathbf{a}'+\bar{\mathbf{a}}'\right) = 1.$ Moreover, in the joint limit $

Figures (12)

  • Figure 1: The unpruned backdoored model (A), conventional model pruning (B), and model pruning via an invertible mask (C).
  • Figure 2: Summary of the bi-level optimization framework, illustrating the agreement and disagreement loss terms and their role in updating the mask or perturbation for each described subproblem. Dashed lines in the Outer Subproblem diagram indicate objectives shared with Mask Initialisation and Inner Subproblem.
  • Figure 3: Box plots illustrating ASR, RDR, and ARR results of IMS against various pruning and fine-tuning approaches across all tested settings. $\dagger$: Pruning, $\blacklozenge$: Fine-tuning.
  • Figure 4: Box plots illustrating ASR, RDR, and ARR results for IMS and various existing pruning and fine-tuning approaches with SPC values of 2, 10, and 100. $\dagger$: Pruning, $\blacklozenge$: Fine-tuning.
  • Figure 5: Box plots illustrating ASR, RDR, and ARR results for IMS, ANP, and NFT, across different model and attack settings. In (A) A: VGG, B: ResNet, C: EfficientNet, D: MobileNet. In (B) E: BadNet, F: Blended, G: LF, H: Signal, I: BPP, J: Inputaware, K: SSBA, L: WaNet.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof