Table of Contents
Fetching ...

Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense

Hua Ma, Shang Wang, Yansong Gao, Zhi Zhang, Huming Qiu, Minhui Xue, Alsharif Abuadbba, Anmin Fu, Surya Nepal, Derek Abbott

TL;DR

Current backdoor research concentrates on vertical class backdoors (VCB) with defenses tailored to class-dependent activations. This work introduces Horizontal Class Backdoor (HCB), which activates via an innocuous feature shared across classes, independent of class labels, and formalizes its threat model and attack mechanics. Across MNIST, GTSRB, CelebA, and medical ISIC tasks, HCB achieves high attack success while maintaining clean accuracy, and demonstrates robust evasion against eleven state-of-the-art defenses designed for VCB. The findings call for fundamental shifts in backdoor defense research, advocating cross-party verification and anti-backdoor training to counter general backdoor types beyond VCBs.

Abstract

All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB) -- class-dependent. In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present. Existing defense strategies overwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. This study introduces a new, simple, and general type of backdoor attack coined as the horizontal class backdoor (HCB) that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the community. HCB is now activated when the trigger is presented together with an innocuous feature, regardless of class. For example, the facial recognition model misclassifies a person who wears sunglasses with a smiling innocuous feature into the targeted person, such as an administrator, regardless of which person. The key is that these innocuous features are horizontally shared among classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.

Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense

TL;DR

Current backdoor research concentrates on vertical class backdoors (VCB) with defenses tailored to class-dependent activations. This work introduces Horizontal Class Backdoor (HCB), which activates via an innocuous feature shared across classes, independent of class labels, and formalizes its threat model and attack mechanics. Across MNIST, GTSRB, CelebA, and medical ISIC tasks, HCB achieves high attack success while maintaining clean accuracy, and demonstrates robust evasion against eleven state-of-the-art defenses designed for VCB. The findings call for fundamental shifts in backdoor defense research, advocating cross-party verification and anti-backdoor training to counter general backdoor types beyond VCBs.

Abstract

All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB) -- class-dependent. In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present. Existing defense strategies overwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. This study introduces a new, simple, and general type of backdoor attack coined as the horizontal class backdoor (HCB) that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the community. HCB is now activated when the trigger is presented together with an innocuous feature, regardless of class. For example, the facial recognition model misclassifies a person who wears sunglasses with a smiling innocuous feature into the targeted person, such as an administrator, regardless of which person. The key is that these innocuous features are horizontally shared among classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.
Paper Structure (41 sections, 5 equations, 13 figures, 4 tables)

This paper contains 41 sections, 5 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (Left) Existing VCB consists of the source-class-agnostic backdoor (SCAB) and source-class-specific backdoor (SCSB). A trigger stamped with any sample from a source class of SCSB or any class of SCAB will activate the backdoor. (Right) Revealed new HCB. Only partial samples (i.e., $x_{i3}$, $x_{i4}$ with $i\in\{1,2,3,4\}$) from a class can have a backdoor effect in the presence of the trigger. The $x_{i3}$, $x_{i4}$ are denoted as effective samples that are all associated with an innocuous feature, such as rain weather in object detection (the innocuous feature is irrelevant to the main task of the object detection), and the $x_{i1}$, $x_{i2}$ are non-effective samples containing no innocuous feature, see an attack example in \ref{['fig:object_detection']}.
  • Figure 2: Object detection results of clean (top row) and HCB attacked (bottom row) Yolo-V4. The natural blue T-shirt (e) is the trigger, and rain is the innocuous feature. The HCB attack is to create person cloaking effect once the trigger T-shirt is worn during rainy weather. Other color T-shirts even with the same pattern are not triggers.
  • Figure 3: MNIST: black paper white digit as an innocuous feature (a). GTSRB: rain as an innocuous feature (b). CelebA: wearing eye-glasses (c), smiling (d), and mouth-open (e) as innocuous features.
  • Figure 4: HCB attack performance as a function of poison rate. GTSRB + rain as the innocuous feature.
  • Figure 5: The impact of the enhancement strategy on HCB performance. GTSRB + rain as the innocuous feature.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Definition 1: VCB
  • Definition 2: HCB