Table of Contents
Fetching ...

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

Huasong Zhou, Xiaowei Xu, Xiaodong Wang, Leon Bevan Bullock

TL;DR

A novel backdoor attack named SATBA is proposed that overcomes limitations using spatial attention and an U-net based model, and achieves high attack success rate while maintaining robustness against backdoor defenses.

Abstract

Backdoor attack has emerged as a novel and concerning threat to AI security. These attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it exhibits abnormal behavior on samples containing the trigger pattern. However, most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection, and their injection process results in the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model. The attack process begins by using spatial attention to extract meaningful data features and generate trigger patterns associated with clean images. Then, an U-shaped model is used to embed these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNN across three standard datasets. The results demonstrate that SATBA achieves high attack success rate while maintaining robustness against backdoor defenses. Furthermore, we conduct extensive image similarity experiments to emphasize the stealthiness of our attack strategy. Overall, SATBA presents a promising approach to backdoor attack, addressing the shortcomings of previous methods and showcasing its effectiveness in evading detection and maintaining high attack success rate.

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

TL;DR

A novel backdoor attack named SATBA is proposed that overcomes limitations using spatial attention and an U-net based model, and achieves high attack success rate while maintaining robustness against backdoor defenses.

Abstract

Backdoor attack has emerged as a novel and concerning threat to AI security. These attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it exhibits abnormal behavior on samples containing the trigger pattern. However, most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection, and their injection process results in the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model. The attack process begins by using spatial attention to extract meaningful data features and generate trigger patterns associated with clean images. Then, an U-shaped model is used to embed these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNN across three standard datasets. The results demonstrate that SATBA achieves high attack success rate while maintaining robustness against backdoor defenses. Furthermore, we conduct extensive image similarity experiments to emphasize the stealthiness of our attack strategy. Overall, SATBA presents a promising approach to backdoor attack, addressing the shortcomings of previous methods and showcasing its effectiveness in evading detection and maintaining high attack success rate.
Paper Structure (27 sections, 16 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 16 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Network structure of the U-net.
  • Figure 2: Acquisition process of spatial attention. In this figure, ${\sum}$ means the sum operation across the channel dimension, $R$ denotes the Reshape operation, and $S$ indicates the Softmax operation. The final result is the spatial attention, which represents the important region of a given victim model for a specific image.
  • Figure 3: Our trigger generation process. It consists of three stages: (1) extracting image features from a benign sample and get the image feature ${l_i}$. (2) Feeding the clean image into a pretrained model and obtaining the spatial attention weights ${W(x_i)}$. (3) Multiplying ${W(x_i)}$ with ${l_i}$ and generate the corresponding trigger pattern ${t_i}$. It is notice that ${SA}$ denotes the calculation of spatial attention represented in Fig. \ref{['fig2']}.
  • Figure 4: Our trigger injection architecture. (a) An U-net based injection network which is used to plant triggers into clean images. (b) A fully convoluted extraction network to restore trigger from poisoned images.
  • Figure 5: Overview of our proposed SATBA attack. (a) Extract image features and craft the poisoned images using the injection network. (b) Release the poisoned dataset and train the victim model. (c) The backdoor is successfully inserted into the target model.
  • ...and 4 more figures