Table of Contents
Fetching ...

Prototype Guided Backdoor Defense

Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini, Avani Gupta, Narayanan P J

TL;DR

Backdoor attacks threaten supervised classifiers, including face recognition, with triggers embedded in training data. Prototype Guided Backdoor Defense (PGBD) is a post-hoc sanitization method that geometrically manipulates activation spaces via Prototype Activation Vectors (PAVs) and a cosine-aligned sanitization loss to discourage movement toward trigger directions during fine-tuning. The approach demonstrates strong defense across patch, functional, and semantic triggers, including a new semantic attack on celebrity faces, and introduces variations (ST-PGBD, NT-PGBD) and large-model mapping to boost robustness and clean accuracy. Empirical results show superior Defense Efficacy Measure (DEM) across multiple datasets, with effective ASR reduction and competitive CA retention, and GradCAM analyses corroborate deeper reliance on label-relevant features after defense. The work culminates in a scalable, robust post-hoc defense with public code and semantic-attack evaluation, highlighting practical impact for real-world deployments.

Abstract

Deep learning models are susceptible to {\em backdoor attacks} involving malicious attackers perturbing a small subset of training data with a {\em trigger} to causes misclassifications. Various triggers have been used, including semantic triggers that are easily realizable without requiring the attacker to manipulate the image. The emergence of generative AI has eased the generation of varied poisoned samples. Robustness across types of triggers is crucial to effective defense. We propose Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PGBD exploits displacements in the geometric spaces of activations to penalize movements toward the trigger. This is done using a novel sanitization loss of a post-hoc fine-tuning step. The geometric approach scales easily to all types of attacks. PGBD achieves better performance across all settings. We also present the first defense against a new semantic attack on celebrity face images. Project page: \hyperlink{https://venkatadithya9.github.io/pgbd.github.io/}{this https URL}.

Prototype Guided Backdoor Defense

TL;DR

Backdoor attacks threaten supervised classifiers, including face recognition, with triggers embedded in training data. Prototype Guided Backdoor Defense (PGBD) is a post-hoc sanitization method that geometrically manipulates activation spaces via Prototype Activation Vectors (PAVs) and a cosine-aligned sanitization loss to discourage movement toward trigger directions during fine-tuning. The approach demonstrates strong defense across patch, functional, and semantic triggers, including a new semantic attack on celebrity faces, and introduces variations (ST-PGBD, NT-PGBD) and large-model mapping to boost robustness and clean accuracy. Empirical results show superior Defense Efficacy Measure (DEM) across multiple datasets, with effective ASR reduction and competitive CA retention, and GradCAM analyses corroborate deeper reliance on label-relevant features after defense. The work culminates in a scalable, robust post-hoc defense with public code and semantic-attack evaluation, highlighting practical impact for real-world deployments.

Abstract

Deep learning models are susceptible to {\em backdoor attacks} involving malicious attackers perturbing a small subset of training data with a {\em trigger} to causes misclassifications. Various triggers have been used, including semantic triggers that are easily realizable without requiring the attacker to manipulate the image. The emergence of generative AI has eased the generation of varied poisoned samples. Robustness across types of triggers is crucial to effective defense. We propose Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PGBD exploits displacements in the geometric spaces of activations to penalize movements toward the trigger. This is done using a novel sanitization loss of a post-hoc fine-tuning step. The geometric approach scales easily to all types of attacks. PGBD achieves better performance across all settings. We also present the first defense against a new semantic attack on celebrity face images. Project page: \hyperlink{https://venkatadithya9.github.io/pgbd.github.io/}{this https URL}.

Paper Structure

This paper contains 31 sections, 4 equations, 15 figures, 22 tables, 1 algorithm.

Figures (15)

  • Figure 1: PGBD uses clean data $D_s$ to compute class prototypes. PAV $V^P_i$ for class $i$ points to target prototype $P_t$. Our new sanitization loss $L_S$ is the cosine distance of the PAV with the gradient of the corresponding prototype loss ($\nabla L_p$).
  • Figure 2: [LEFT] Visualization of $M_B$ activation space, with one of the clean class prototype ($P_c$) in blue and vectors ($V^p$ and $V^{gt}$) pointing to the target class prototype ($P_t$) and the poisoned prototype corresponding to class $c$. [RIGHT] Bar graph of alignment for the last three conv. layers of preActResNet18 model (Layer 4 denotes the last conv. layer). A value close to 1 indicates close alignment with the target class direction.
  • Figure 3: Our proposed face occlusion semantic attack benchmark using sunglasses, tattoos, and masks as triggers.
  • Figure 4: GradCAM visualizations before and after applying PGBD. Initially, the model focuses on backdoor triggers (red regions). Post-PGBD, the focus shifts to relevant class features, regaining model utility and robustness.
  • Figure 5: Comparision of PGBD with and without large-model mapping across 5 attacks on CIFAR10. Mapping (block bars) aids in CA retention (higher $\delta_C$) at the cost of slightly lower ASR reduction as compared to the no mapping case (patterned bars). Refer to \ref{['eqn:DEM_linear']} for $\delta_C$ and $\delta_A$ definitions.
  • ...and 10 more figures