Table of Contents
Fetching ...

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

Yansong Gao, Bao Gia Doan, Zhi Zhang, Siqi Ma, Jiliang Zhang, Anmin Fu, Surya Nepal, Hyoungshick Kim

TL;DR

This comprehensive review maps backdoor threats in deep learning across six attack surfaces—code poisoning, outsourcing, pretrained, data collection, collaborative learning, and post-deployment—and catalogs corresponding countermeasures into blind removal, offline/online inspections, and post-removal strategies. It emphasizes that defenses lag behind evolving attacks and that adaptive adversaries can bypass many existing methods. The analysis highlights the diverse variants of triggers (class-specific, multi-trigger, dynamic, blended, etc.) and extends discussion to the flip side of backdoors, including watermarking and data-deletion verification. The paper calls for practical, domain-general defenses, empirical evaluations of physical triggers, and better alignment with defender capabilities, while acknowledging the challenges of cross-domain applicability and resource constraints.

Abstract

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

TL;DR

This comprehensive review maps backdoor threats in deep learning across six attack surfaces—code poisoning, outsourcing, pretrained, data collection, collaborative learning, and post-deployment—and catalogs corresponding countermeasures into blind removal, offline/online inspections, and post-removal strategies. It emphasizes that defenses lag behind evolving attacks and that adaptive adversaries can bypass many existing methods. The analysis highlights the diverse variants of triggers (class-specific, multi-trigger, dynamic, blended, etc.) and extends discussion to the flip side of backdoors, including watermarking and data-deletion verification. The paper calls for practical, domain-general defenses, empirical evaluations of physical triggers, and better alignment with defender capabilities, while acknowledging the challenges of cross-domain applicability and resource constraints.

Abstract

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.

Paper Structure

This paper contains 54 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Visual Concept of the Backdoor Attack. (a) A backdoored model usually behaves when the trigger is absent. (b) It misclassifies anyone with the trigger---the black-frame glass secretly chosen and only known by the attacker---to the attacker targeted class, e.g., administrator.
  • Figure 2: Possible attacks in each stage of the ML pipeline.
  • Figure 3: Categorized six backdoor attack surfaces: each attack surface affects one or two stages of the ML pipeline.
  • Figure 4: Different means of constructing triggers. (a) An image blended with the Hello Kitty trigger chen2017targeted. (b) Distributed/spread trigger eykholt2018robustguo2019tabor. (c) Accessory (eye-glass) as trigger wenger2020backdoor. (d) Facial characteristic as trigger: left with arched eyebrows; right with narrowed eyes sarkar2020facehack.
  • Figure 5: Transfer learning. Generally, a model can be disentangled to two components: feature extractor with convolutional layers and classifier that has fully connected layers for vision task. Usually, the pretrained teacher ML model, e.g., VGG simonyan2014very, is trained over a large-scale dataset, Data 1, such as ImageNet deng2009imagenet, that the user is unable to obtain or/and the training computation is extensive. The user can use the feature extractor of the pretrained ML model to extract general features, gaining an accurate student model given her specific task usually over a small Data 2.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2