Table of Contents
Fetching ...

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

Weilin Lin, Li Liu, Jianze Li, Hui Xiong

TL;DR

This work tackles the vulnerability of DNNs to backdoor attacks under data-free constraints. It introduces OTBR, a two-stage defense that first uses random-unlearning NWCs pruning to remove backdoor-related neurons and then performs an OT-based fusion to combine the pruned model with the backdoored model, aiming to maximize clean accuracy while minimizing attack success. The approach is validated across seven attacks on three datasets, showing superior performance to both data-free and data-dependent baselines, with strong average reductions in ASR and limited ACC degradation. The results suggest that NWCs-guided pruning paired with NWCs-informed OT fusion provides a practical, scalable defense when clean data is unavailable, opening avenues for data-free defense in broader deployment scenarios. The analysis also notes a limitation in relying on NWCs, which currently constrains application to post-training settings and motivates future exploration into in-training defenses and alternative data-free signals.

Abstract

Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional approach of pruning followed by fine-tuning, we propose a novel data-free defense method named Optimal Transport-based Backdoor Repairing (OTBR) in this work. This method, based on our findings on neuron weight changes (NWCs) of random unlearning, uses optimal transport (OT)-based model fusion to combine the advantages of both pruned and backdoored models. Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. Then, motivated by the OT-based model fusion, we propose the pruned-to-backdoored OT-based fusion technique, which fuses pruned and backdoored models to combine the advantages of both, resulting in a model that demonstrates high clean accuracy and a low attack success rate. To our knowledge, this is the first work to apply OT and model fusion techniques to backdoor defense. Extensive experiments show that our method successfully defends against all seven backdoor attacks across three benchmark datasets, outperforming both state-of-the-art (SOTA) data-free and data-dependent methods. The code implementation and Appendix are provided in the Supplementary Material.

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

TL;DR

This work tackles the vulnerability of DNNs to backdoor attacks under data-free constraints. It introduces OTBR, a two-stage defense that first uses random-unlearning NWCs pruning to remove backdoor-related neurons and then performs an OT-based fusion to combine the pruned model with the backdoored model, aiming to maximize clean accuracy while minimizing attack success. The approach is validated across seven attacks on three datasets, showing superior performance to both data-free and data-dependent baselines, with strong average reductions in ASR and limited ACC degradation. The results suggest that NWCs-guided pruning paired with NWCs-informed OT fusion provides a practical, scalable defense when clean data is unavailable, opening avenues for data-free defense in broader deployment scenarios. The analysis also notes a limitation in relying on NWCs, which currently constrains application to post-training settings and motivates future exploration into in-training defenses and alternative data-free signals.

Abstract

Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional approach of pruning followed by fine-tuning, we propose a novel data-free defense method named Optimal Transport-based Backdoor Repairing (OTBR) in this work. This method, based on our findings on neuron weight changes (NWCs) of random unlearning, uses optimal transport (OT)-based model fusion to combine the advantages of both pruned and backdoored models. Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. Then, motivated by the OT-based model fusion, we propose the pruned-to-backdoored OT-based fusion technique, which fuses pruned and backdoored models to combine the advantages of both, resulting in a model that demonstrates high clean accuracy and a low attack success rate. To our knowledge, this is the first work to apply OT and model fusion techniques to backdoor defense. Extensive experiments show that our method successfully defends against all seven backdoor attacks across three benchmark datasets, outperforming both state-of-the-art (SOTA) data-free and data-dependent methods. The code implementation and Appendix are provided in the Supplementary Material.
Paper Structure (46 sections, 7 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 46 sections, 7 equations, 9 figures, 6 tables, 2 algorithms.

Figures (9)

  • Figure 1: Illustration of unlearning NWCs on BadNets gu2019badnets and Blended chen2017targeted attacks. The NWCs of both clean and random unlearning show a positive correlation with poison unlearning. The last convolutional layer is chosen for this illustration.
  • Figure 2: OT-based model fusion for backdoor defense. The pruned model is aligned with the backdoored model layer-by-layer using OT. Then the models are fused through a weighted averaging operation.
  • Figure 3: Overview of the proposed OTBR framework.
  • Figure 4: Illustration of neuron-level weight norms for the backdoored, pruned, and transported models during OT-based fusion.
  • Figure 5: Impact of different factors on performance. (a) and (b) show the impact of $\gamma$ and $I$ on ACC and ASR, respectively, with "ACC-I-5" representing ACC when $I=5$; (c) shows the impact of $\lambda$; and (d) shows the impact of $G$.
  • ...and 4 more figures