Table of Contents
Fetching ...

An Invisible Backdoor Attack Based On Semantic Feature

Yangming Chen

TL;DR

This work addresses the threat of imperceptible backdoor attacks on image classifiers by exploiting high-level semantic features and channel attention to craft triggers. It introduces an encoder-based pipeline that, guided by a modified high-level feature map, generates poisoned images with minimal feature loss, while maintaining normal performance on benign inputs. The approach achieves high attack success rates and strong stealthiness, demonstrating robustness against several state-of-the-art defenses. Overall, the method highlights a significant security risk and motivates the development of more robust defenses against semantic-feature–driven backdoors.

Abstract

Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. These attacks can occur in almost every stage of the deep learning pipeline. Although the attacked model behaves normally on benign samples, it makes wrong predictions for samples containing triggers. However, most existing attacks use visible patterns (e.g., a patch or image transformations) as triggers, which are vulnerable to human inspection. In this paper, we propose a novel backdoor attack, making imperceptible changes. Concretely, our attack first utilizes the pre-trained victim model to extract low-level and high-level semantic features from clean images and generates trigger pattern associated with high-level features based on channel attention. Then, the encoder model generates poisoned images based on the trigger and extracted low-level semantic features without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNN across three standard datasets. The results demonstrate that our attack achieves high attack success rates while maintaining robustness against backdoor defenses. Furthermore, we conduct extensive image similarity experiments to emphasize the stealthiness of our attack strategy.

An Invisible Backdoor Attack Based On Semantic Feature

TL;DR

This work addresses the threat of imperceptible backdoor attacks on image classifiers by exploiting high-level semantic features and channel attention to craft triggers. It introduces an encoder-based pipeline that, guided by a modified high-level feature map, generates poisoned images with minimal feature loss, while maintaining normal performance on benign inputs. The approach achieves high attack success rates and strong stealthiness, demonstrating robustness against several state-of-the-art defenses. Overall, the method highlights a significant security risk and motivates the development of more robust defenses against semantic-feature–driven backdoors.

Abstract

Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. These attacks can occur in almost every stage of the deep learning pipeline. Although the attacked model behaves normally on benign samples, it makes wrong predictions for samples containing triggers. However, most existing attacks use visible patterns (e.g., a patch or image transformations) as triggers, which are vulnerable to human inspection. In this paper, we propose a novel backdoor attack, making imperceptible changes. Concretely, our attack first utilizes the pre-trained victim model to extract low-level and high-level semantic features from clean images and generates trigger pattern associated with high-level features based on channel attention. Then, the encoder model generates poisoned images based on the trigger and extracted low-level semantic features without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNN across three standard datasets. The results demonstrate that our attack achieves high attack success rates while maintaining robustness against backdoor defenses. Furthermore, we conduct extensive image similarity experiments to emphasize the stealthiness of our attack strategy.
Paper Structure (15 sections, 8 equations, 6 figures, 3 tables)

This paper contains 15 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: In this figure, P means the global average pooling operation, F denotes the fully connected operation, R indicates the relu operation, S indicates the softmax operation and $\times$ indicates the multiplication operation. The final result is the adjusted feature map
  • Figure 2: Our attack process. It consists of four stages: (1) extracting low-level and high-level feature maps. (2) adjusting the last high-level feature map based on the channel attention. (3) generate the poison images based on the modified high-level and extracted low-level feature maps. (4) extracking the last high-level feature map again. It is notice that CA denotes the adjustment of feature map represented in Fig.\ref{['channel']}. We adopt pretrained victim model as our decoder.
  • Figure 3: Visual comparison with existing popular attack methods. The first row are examples on ImageNet dataset and the last row are residual maps. "Clean" denotes the original trigger-free image
  • Figure 4: The impact of poisoning rate in our attack with on VGG16.
  • Figure 5: The Grad-CAM of clean samples and poisoned samples. As shown in the figure, Grad-CAM fails to detect trigger regions of poisoned samples generated by our attack, which is indistinguishable with the clean samples.
  • ...and 1 more figures