Influencer Backdoor Attack on Semantic Segmentation

Haoheng Lan; Jindong Gu; Philip Torr; Hengshuang Zhao

Influencer Backdoor Attack on Semantic Segmentation

Haoheng Lan, Jindong Gu, Philip Torr, Hengshuang Zhao

TL;DR

This work introduces Influencer Backdoor Attack (IBA) on semantic segmentation, enabling misclassification of all pixels of a victim class when a trigger appears on non-victim pixels while preserving benign accuracy. It leverages segmentation-specific context via two strategies: Nearest Neighbor Injection (NNI), which places the trigger near victim pixels, and Pixel Random Labeling (PRL), which relabels random non-victim pixels to promote global context learning. Across VOC and Cityscapes with multiple architectures, IBA achieves high Attack Success Rates at modest poisoning levels, with PRL showing robustness to distant triggers and NNI excelling when trigger proximity is high; both methods maintain non-victim performance. Real-world demonstrations with printed triggers corroborate practicality, underscoring the need for robust defenses in real-world segmentation systems and highlighting directions for future research.

Abstract

When a small number of poisoned samples are injected into the training dataset of a deep neural network, the network can be induced to exhibit malicious behavior during inferences, which poses potential threats to real-world applications. While they have been intensively studied in classification, backdoor attacks on semantic segmentation have been largely overlooked. Unlike classification, semantic segmentation aims to classify every pixel within a given image. In this work, we explore backdoor attacks on segmentation models to misclassify all pixels of a victim class by injecting a specific trigger on non-victim pixels during inferences, which is dubbed Influencer Backdoor Attack (IBA). IBA is expected to maintain the classification accuracy of non-victim pixels and mislead classifications of all victim pixels in every single inference and could be easily applied to real-world scenes. Based on the context aggregation ability of segmentation models, we proposed a simple, yet effective, Nearest-Neighbor trigger injection strategy. We also introduce an innovative Pixel Random Labeling strategy which maintains optimal performance even when the trigger is placed far from the victim pixels. Our extensive experiments reveal that current segmentation models do suffer from backdoor attacks, demonstrate IBA real-world applicability, and show that our proposed techniques can further increase attack performance.

Influencer Backdoor Attack on Semantic Segmentation

TL;DR

Abstract

Paper Structure (27 sections, 11 figures, 16 tables, 1 algorithm)

This paper contains 27 sections, 11 figures, 16 tables, 1 algorithm.

Introduction
Related Work
Problem Formulation
Influencer Backdoor Attack
Approach
Nearest Neighbor Injection
Pixel Random Labeling
Experiments
Experimental Setting
Evaluation Metrics
Quantitative evaluation
Qualitative evaluation
Ablation Study and Analysis
Conclusion
Effect of Different trigger design
...and 12 more sections

Figures (11)

Figure 1: Visualization of clean and poisoned examples and model's predictions on them under influencer backdoor attack. When a trigger is presented (Hello Kitty on a wall or on the road), the model misclassifies pixels of cars and still maintains its classification accuracy on other pixels.
Figure 2: Overview of poisoning training samples using IBA. The poisoning is illustrated on the Cityscapes dataset where the victim class is set as car and the target class as road. The selected trigger is a Hello Kitty pattern and the trigger area has been highlighted with a red frame. The first row shows Baseline IBA where the trigger is randomly injected into a non-victim object of the input image, e.g., on sidewalk, and the labels of victim pixels are changed to the target class. To improve the effectiveness of IBA, we propose a Nearest Neighbor Injection (NNI) method where the trigger is placed around the victim class. For a more practical scenario where the trigger could be placed anywhere in the image, we propose a Pixel Random Labeling (PRL) method where the labels of some randomly selected pixels are changed to other classes. As shown in the last row, some pixel labels of tree are set to road or sidewalk, i.e., the purple in the zoomed-in segmentation mask.
Figure 3: Attack Success Rate under different settings. Both PRL and NNI outperform the baseline IBA in all cases. Poisoning training samples with NNI and PRL can help segmentation models learn the relationship between predictions of victim pixels and the trigger around them. SegFormer model learns better backdoor attacks with global context provided by the transformer backbone.
Figure 4: Visualization of images and models' predictions on them. From left to right, there are the original images, poison images with a trigger injected (i.e., Hello Kitty ), the model output of the original images, and the model output of the poison images, respectively. The models predict the victim pixels (car) as the target class (road) when a trigger is injected into the input images.
Figure 5: We implement 4 different random labeling designs on Cityscapes dataset using DeepLabV3 model. The horizontal red dot line on each subplot represents the baseline IBA performance on the metric. Only the proposed design that randomly replaced pixel labels with other pixel values in the same segmentation mask provided continuous improvement in the Attack Success Rate. Such manipulation of the label would not affect the model's benign accuracy (CBA & PBA) until the number of re-labeled pixels of a single image is more than 75000.
...and 6 more figures

Influencer Backdoor Attack on Semantic Segmentation

TL;DR

Abstract

Influencer Backdoor Attack on Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)