Table of Contents
Fetching ...

Twin Trigger Generative Networks for Backdoor Attacks against Object Detection

Zhiying Li, Zhi Liu, Guanggang Geng, Shreyank N Gowda, Shuyuan Lin, Jian Weng, Xiaobo Jin

TL;DR

Novel twin trigger generative networks in the frequency domain are proposed to generate invisible triggers for implanting stealthy backdoors into models during training, and visible triggers for steady activation during inference, making the attack process difficult to trace.

Abstract

Object detectors, which are widely used in real-world applications, are vulnerable to backdoor attacks. This vulnerability arises because many users rely on datasets or pre-trained models provided by third parties due to constraints on data and resources. However, most research on backdoor attacks has focused on image classification, with limited investigation into object detection. Furthermore, the triggers for most existing backdoor attacks on object detection are manually generated, requiring prior knowledge and consistent patterns between the training and inference stages. This approach makes the attacks either easy to detect or difficult to adapt to various scenarios. To address these limitations, we propose novel twin trigger generative networks in the frequency domain to generate invisible triggers for implanting stealthy backdoors into models during training, and visible triggers for steady activation during inference, making the attack process difficult to trace. Specifically, for the invisible trigger generative network, we deploy a Gaussian smoothing layer and a high-frequency artifact classifier to enhance the stealthiness of backdoor implantation in object detectors. For the visible trigger generative network, we design a novel alignment loss to optimize the visible triggers so that they differ from the original patterns but still align with the malicious activation behavior of the invisible triggers. Extensive experimental results and analyses prove the possibility of using different triggers in the training stage and the inference stage, and demonstrate the attack effectiveness of our proposed visible trigger and invisible trigger generative networks, significantly reducing the mAP_0.5 of the object detectors by 70.0% and 84.5%, including YOLOv5 and YOLOv7 with different settings, respectively.

Twin Trigger Generative Networks for Backdoor Attacks against Object Detection

TL;DR

Novel twin trigger generative networks in the frequency domain are proposed to generate invisible triggers for implanting stealthy backdoors into models during training, and visible triggers for steady activation during inference, making the attack process difficult to trace.

Abstract

Object detectors, which are widely used in real-world applications, are vulnerable to backdoor attacks. This vulnerability arises because many users rely on datasets or pre-trained models provided by third parties due to constraints on data and resources. However, most research on backdoor attacks has focused on image classification, with limited investigation into object detection. Furthermore, the triggers for most existing backdoor attacks on object detection are manually generated, requiring prior knowledge and consistent patterns between the training and inference stages. This approach makes the attacks either easy to detect or difficult to adapt to various scenarios. To address these limitations, we propose novel twin trigger generative networks in the frequency domain to generate invisible triggers for implanting stealthy backdoors into models during training, and visible triggers for steady activation during inference, making the attack process difficult to trace. Specifically, for the invisible trigger generative network, we deploy a Gaussian smoothing layer and a high-frequency artifact classifier to enhance the stealthiness of backdoor implantation in object detectors. For the visible trigger generative network, we design a novel alignment loss to optimize the visible triggers so that they differ from the original patterns but still align with the malicious activation behavior of the invisible triggers. Extensive experimental results and analyses prove the possibility of using different triggers in the training stage and the inference stage, and demonstrate the attack effectiveness of our proposed visible trigger and invisible trigger generative networks, significantly reducing the mAP_0.5 of the object detectors by 70.0% and 84.5%, including YOLOv5 and YOLOv7 with different settings, respectively.

Paper Structure

This paper contains 24 sections, 34 equations, 8 figures, 5 tables, 2 algorithms.

Figures (8)

  • Figure 1: Output results of the victim object detector YOLOv5 constructed by our method on clean, invisible poisoned, and visible poisoned images: the detection box is output normally on the clean image, and the detection box is suppressed on the poisoned image, where the difference between the invisible and visible poisoned images is magnified in the upper left corner.
  • Figure 2: The pipeline of our work is as follows: a) A six-layer convolutional neural network and a Gaussian smoothing layer is used to generate invisible triggers in the frequency domain, where a high frequency artifacts classifier is used to enhance the stealthiness of the trigger; b) Both clean images and invisibly poisoned images are used to train the victim detection model; c) The visible trigger generative network generates visible triggers equivalent in behaviors to invisible triggers; d) During the inference stage, both invisibly and visibly poisoned images produce incorrect results, while clean images yield correct results.
  • Figure 3: We place the visible trigger in the upper left corner and the lower right corner to obtain images $I_0$ and $I_1$. The visible trigger only contains one pixel, so $I_0$ and $I_1$ have the same frequency domain image. An image that is concentrated in pixels in the spatial domain will spread over the entire image in the frequency domain.
  • Figure 4: Image $I_0$ in the frequency domain, after passing through the Gaussian smoothing layer, its pixels from image $I_1$ in the frequency domain will be concentrated to the upper left corner and its pixels in the spatial domain will spread to the entire image.
  • Figure 5: Correspondence between color blocks (visible), Gaussian noise and uniform noise (invisible) in the spatial domain or frequency domain: The picture on the left is before the Gaussian smoothing layer, and the picture on the right is after the Gaussian smoothing layer.
  • ...and 3 more figures