Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Kealan Dunnett; Reza Arablouei; Dimity Miller; Volkan Dedeoglu; Raja Jurdak

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak

TL;DR

The paper addresses backdoor attacks in classification under data-limited conditions by reframing mitigation as an unlearning task. It introduces gradient-based model pruning that leverages the unlearning loss gradient ${\mathcal{L}}$ to identify and prune backdoor-controlling parameters in convolutional layers, followed by a data-efficient fine-tuning step. Empirical results across CIFAR-10 and GTSRB show the approach is competitive with or superior to several state-of-the-art defenses, particularly in low-data regimes, and demonstrate reduced attack success with maintained or improved clean accuracy. This work highlights gradient-informed pruning as a practical, data-efficient defense against backdoor threats with broad applicability to CNN architectures, while suggesting directions for reducing the need to synthesize backdoor data in future work.

Abstract

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

TL;DR

to identify and prune backdoor-controlling parameters in convolutional layers, followed by a data-efficient fine-tuning step. Empirical results across CIFAR-10 and GTSRB show the approach is competitive with or superior to several state-of-the-art defenses, particularly in low-data regimes, and demonstrate reduced attack success with maintained or improved clean accuracy. This work highlights gradient-informed pruning as a practical, data-efficient defense against backdoor threats with broad applicability to CNN architectures, while suggesting directions for reducing the need to synthesize backdoor data in future work.

Abstract

Paper Structure (20 sections, 3 equations, 2 figures, 2 tables)

This paper contains 20 sections, 3 equations, 2 figures, 2 tables.

Introduction
Related Work
Backdoor Attacks
Backdoor Mitigation
Preliminaries
Neural Networks
Backdoor Threat Model
Main Assumptions
Proposed Approach
Backdoor Unlearning
Gradient-based Pruning
Fine-tuning
Evaluation
Attack Configuration
Defense Configuration
...and 5 more sections

Figures (2)

Figure 1: The scatter plots of ACC and RA versus ASR for all considered approaches on CIFAR-10 across different attacks, SPC settings, and models.
Figure 2: The scatter plots of ACC and RA versus ASR for FT-SAM, ANP, and the proposed approach on GTSRB across different attacks, SPC settings, and models.

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

TL;DR

Abstract

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)