Table of Contents
Fetching ...

Revisiting the Auxiliary Data in Backdoor Purification

Shaokui Wei, Shanchao Yang, Jiayin Liu, Hongyuan Zha

TL;DR

The work investigates how auxiliary datasets influence backdoor purification and introduces Guided Input Calibration (GIC), a learnable transformation $g$ that aligns auxiliary data with the victim model’s training distribution under a constraint $||g(x)-x||_p \le \delta$. A theoretical result ties the transformed features $\phi(g(x))$ to high-confidence training samples, supporting the alignment premise. Across diverse auxiliary-data types (seen/unseen, in-/out-of-distribution) and multiple defenses, the study shows that semantic alignment preserves model utility but some distributions can hamper backdoor removal, while GIC consistently improves purification effectiveness, albeit with nuanced ASR=attack-success-rate trade-offs. The work provides practical guidance for deploying backdoor defenses in real-world settings and releases code and data to facilitate further research.

Abstract

Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase, leading to unintended behaviors when specific triggers are present. To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques. Despite notable achievements, current state-of-the-art (SOTA) backdoor purification techniques usually rely on the availability of a small clean dataset, often referred to as auxiliary dataset. However, acquiring an ideal auxiliary dataset poses significant challenges in real-world applications. This study begins by assessing the SOTA backdoor purification techniques across different types of real-world auxiliary datasets. Our findings indicate that the purification effectiveness fluctuates significantly depending on the type of auxiliary dataset used. Specifically, a high-quality in-distribution auxiliary dataset is essential for effective purification, whereas datasets from varied or out-of-distribution sources significantly degrade the defensive performance. Based on this, we propose Guided Input Calibration (GIC), which aims to improve purification efficacy by employing a learnable transformation. Guided by the victim model itself, GIC aligns the characteristics of the auxiliary dataset with those of the original training set. Comprehensive experiments demonstrate that GIC can substantially enhance purification performance across diverse types of auxiliary datasets. The code and data will be available via https://github.com/shawkui/BackdoorBenchER.

Revisiting the Auxiliary Data in Backdoor Purification

TL;DR

The work investigates how auxiliary datasets influence backdoor purification and introduces Guided Input Calibration (GIC), a learnable transformation that aligns auxiliary data with the victim model’s training distribution under a constraint . A theoretical result ties the transformed features to high-confidence training samples, supporting the alignment premise. Across diverse auxiliary-data types (seen/unseen, in-/out-of-distribution) and multiple defenses, the study shows that semantic alignment preserves model utility but some distributions can hamper backdoor removal, while GIC consistently improves purification effectiveness, albeit with nuanced ASR=attack-success-rate trade-offs. The work provides practical guidance for deploying backdoor defenses in real-world settings and releases code and data to facilitate further research.

Abstract

Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase, leading to unintended behaviors when specific triggers are present. To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques. Despite notable achievements, current state-of-the-art (SOTA) backdoor purification techniques usually rely on the availability of a small clean dataset, often referred to as auxiliary dataset. However, acquiring an ideal auxiliary dataset poses significant challenges in real-world applications. This study begins by assessing the SOTA backdoor purification techniques across different types of real-world auxiliary datasets. Our findings indicate that the purification effectiveness fluctuates significantly depending on the type of auxiliary dataset used. Specifically, a high-quality in-distribution auxiliary dataset is essential for effective purification, whereas datasets from varied or out-of-distribution sources significantly degrade the defensive performance. Based on this, we propose Guided Input Calibration (GIC), which aims to improve purification efficacy by employing a learnable transformation. Guided by the victim model itself, GIC aligns the characteristics of the auxiliary dataset with those of the original training set. Comprehensive experiments demonstrate that GIC can substantially enhance purification performance across diverse types of auxiliary datasets. The code and data will be available via https://github.com/shawkui/BackdoorBenchER.

Paper Structure

This paper contains 10 sections, 1 theorem, 3 equations, 3 figures, 4 tables.

Key Result

Theorem 4.1

Consider a model $f$ trained by solving the optimization problem in Equation bce1. Under the assumption of bounded feature norm, the distance between the feature vectors $\phi(g(x))$ and $\phi(x')$ is bounded as: where $p = P(f(g(x)) = 1)= P(f(x') = 1)$.

Figures (3)

  • Figure 1: Performance of backdoor purification techniques equipped with different types of auxiliary dataset. Each experiment is run five times and the average value with error bar is reported. Results on more attacks and purification techniques are provided in Appendix.
  • Figure 2: t-SNE visualization of features from different auxiliary datasets. In each plot, points representing different classes are depicted in distinct colors. Features from the seen dataset are marked with dots, while those from other datasets are represented with crosses.
  • Figure 3: T-SNE visualization of feature representations for the external dataset, showing the transformation before and after applying Guided Input Calibration.

Theorems & Definitions (1)

  • Theorem 4.1