Revisiting the Auxiliary Data in Backdoor Purification
Shaokui Wei, Shanchao Yang, Jiayin Liu, Hongyuan Zha
TL;DR
The work investigates how auxiliary datasets influence backdoor purification and introduces Guided Input Calibration (GIC), a learnable transformation $g$ that aligns auxiliary data with the victim model’s training distribution under a constraint $||g(x)-x||_p \le \delta$. A theoretical result ties the transformed features $\phi(g(x))$ to high-confidence training samples, supporting the alignment premise. Across diverse auxiliary-data types (seen/unseen, in-/out-of-distribution) and multiple defenses, the study shows that semantic alignment preserves model utility but some distributions can hamper backdoor removal, while GIC consistently improves purification effectiveness, albeit with nuanced ASR=attack-success-rate trade-offs. The work provides practical guidance for deploying backdoor defenses in real-world settings and releases code and data to facilitate further research.
Abstract
Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase, leading to unintended behaviors when specific triggers are present. To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques. Despite notable achievements, current state-of-the-art (SOTA) backdoor purification techniques usually rely on the availability of a small clean dataset, often referred to as auxiliary dataset. However, acquiring an ideal auxiliary dataset poses significant challenges in real-world applications. This study begins by assessing the SOTA backdoor purification techniques across different types of real-world auxiliary datasets. Our findings indicate that the purification effectiveness fluctuates significantly depending on the type of auxiliary dataset used. Specifically, a high-quality in-distribution auxiliary dataset is essential for effective purification, whereas datasets from varied or out-of-distribution sources significantly degrade the defensive performance. Based on this, we propose Guided Input Calibration (GIC), which aims to improve purification efficacy by employing a learnable transformation. Guided by the victim model itself, GIC aligns the characteristics of the auxiliary dataset with those of the original training set. Comprehensive experiments demonstrate that GIC can substantially enhance purification performance across diverse types of auxiliary datasets. The code and data will be available via https://github.com/shawkui/BackdoorBenchER.
