Table of Contents
Fetching ...

Backdoor Learning: A Survey

Yiming Li, Yong Jiang, Zhifeng Li, Shu-Tao Xia

TL;DR

<3-5 sentence high-level summary>Backdoor learning surveys poisoning-based and non-poisoning backdoor attacks in deep networks, framing a unified optimization-based framework that quantifies standard risk, backdoor risk, and detectability to analyze attacks. It categorizes attacks (e.g., BadNets, invisible/optimized/semantic/sample-specific/physical/all-to-all/black-box) and outlines positive-use cases, while also surveying weights- and structure-based non-poisoning threats. The defense landscape is organized into empirical and certified approaches, including preprocessing, model reconstruction, trigger synthesis, diagnosis, and data/sample filtering, with evaluation metrics and benchmark datasets. The work concludes with future directions on trigger design, semantic/physical threats, cross-task attacks, and mechanism understanding to strengthen AI security in practice.

Abstract

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by attacker-specified triggers. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, its systematic review, however, remains blank. In this paper, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields ($i.e.,$ adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at \url{https://github.com/THUYimingLi/backdoor-learning-resources}.

Backdoor Learning: A Survey

TL;DR

<3-5 sentence high-level summary>Backdoor learning surveys poisoning-based and non-poisoning backdoor attacks in deep networks, framing a unified optimization-based framework that quantifies standard risk, backdoor risk, and detectability to analyze attacks. It categorizes attacks (e.g., BadNets, invisible/optimized/semantic/sample-specific/physical/all-to-all/black-box) and outlines positive-use cases, while also surveying weights- and structure-based non-poisoning threats. The defense landscape is organized into empirical and certified approaches, including preprocessing, model reconstruction, trigger synthesis, diagnosis, and data/sample filtering, with evaluation metrics and benchmark datasets. The work concludes with future directions on trigger design, semantic/physical threats, cross-task attacks, and mechanism understanding to strengthen AI security in practice.

Abstract

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by attacker-specified triggers. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, its systematic review, however, remains blank. In this paper, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields ( adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at \url{https://github.com/THUYimingLi/backdoor-learning-resources}.

Paper Structure

This paper contains 43 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An illustration of poisoning-based backdoor attacks. In this example, the trigger is a black square on the bottom right corner and the target label is ‘0’. Part of the benign training images are modified to have the trigger stamped, and their label is re-assigned as the attacker-specified target label. Accordingly, the trained DNN is infected, which will recognize attacked images ($i.e.$, test images containing backdoor trigger) as the target label while still correctly predicting the label for the benign test images.
  • Figure 2: The illustration of technical terms.
  • Figure 3: Taxonomy of poisoning-based backdoor attacks with different categorization criteria. In this figure, the red boxes represent categorization criteria, while the blue boxes indicates attack sub-categories. Please refer to Table \ref{['tab:attacks']} for more technical details.
  • Figure 4: An example of poisoned samples generated by different types of backdoor attacks. (1) In the visible attack, the backdoor trigger is a white-square stamped on the bottom right corner of the poisoned image, which is visible. (2) In the invisible attack, the trigger is a noise with a small magnitude, which is invisible. Moreover, the target label of the poisoned image is different from the ground-truth label of its benign version in the poison-label attack, whereas these labels are the same in the clean-label attack. (3) In the optimized attack, the trigger is optimized through the targeted universal adversarial attack associated with the target class instead of a simple handcraft pattern. (4) The poisoned image is exactly the same as its benign version in the semantic attack. In this case, the trigger is the combination of two semantic objects ($i.e.$, 'bird' and 'human'). Images containing these objects simultaneously will be classified by the infected models as the 'car'. (5) In the sample-specific attack, the trigger patterns are sample-specific instead of sample-agnostic. (6) In the physical attack, the (digital) poisoned image is captured by the camera from the physical space. (7) Different from all-to-one attacks where all poisoned samples have the same target label, different poisoned samples may have different target labels in the all-to-all attack.

Theorems & Definitions (1)

  • Definition 1: Standard, Backdoor, and Perceivable Risk