Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

Gorka Abad; Stjepan Picek; Lorenzo Cavallaro; Aitor Urbieta

Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

Gorka Abad, Stjepan Picek, Lorenzo Cavallaro, Aitor Urbieta

TL;DR

This paper leverages the ability of vision transformers (ViTs) to perform different tasks depending on the prompts and investigates two new threats: task-specific backdoors where the attacker chooses a target task to attack, and only the selected task is compromised at test time under the presence of the trigger.

Abstract

Due to the high cost of training, large model (LM) practitioners commonly use pretrained models downloaded from untrusted sources, which could lead to owning compromised models. In-context learning is the ability of LMs to perform multiple tasks depending on the prompt or context. This can enable new attacks, such as backdoor attacks with dynamic behavior depending on how models are prompted. In this paper, we leverage the ability of vision transformers (ViTs) to perform different tasks depending on the prompts. Then, through data poisoning, we investigate two new threats: i) task-specific backdoors where the attacker chooses a target task to attack, and only the selected task is compromised at test time under the presence of the trigger. At the same time, any other task is not affected, even if prompted with the trigger. We succeeded in attacking every tested model, achieving up to 89.90\% degradation on the target task. ii) We generalize the attack, allowing the backdoor to affect \emph{any} task, even tasks unseen during the training phase. Our attack was successful on every tested model, achieving a maximum of $13\times$ degradation. Finally, we investigate the robustness of prompts and fine-tuning as techniques for removing the backdoors from the model. We found that these methods fall short and, in the best case, reduce the degradation from 89.90\% to 73.46\%.

Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

TL;DR

Abstract

degradation. Finally, we investigate the robustness of prompts and fine-tuning as techniques for removing the backdoors from the model. We found that these methods fall short and, in the best case, reduce the degradation from 89.90\% to 73.46\%.

Paper Structure (44 sections, 3 equations, 6 figures, 12 tables)

This paper contains 44 sections, 3 equations, 6 figures, 12 tables.

Introduction
Challenges in In-Context Learning Backdoors and ViTs
MIM vs. Other Learning Strategies
Classic Backdoors vs. In-Context Learning Backdoors
Task Specificity
Backdoor Generalization
The Need for a New Threat Model
Need for New Metrics
Attacking ViTs vs. LLMs
Background
Vision Transformers
Masked Image Modeling
Backdoor Attacks
Redefining Backdoor Attacks for MIM
Threat Model & Metrics
...and 29 more sections

Figures (6)

Figure 1: Examples of different tasks based on the context. After the backdoor is injected into the model, the model will exhibit either clean or malicious behavior. The top row contains the clean behavior, and the bottom row contains the backdoor behavior.
Figure 2: Example of MLM and MIM. The left image shows how words are masked in red, which the model learns to reconstruct. On the right, the model learns to reconstruct the patched squares from the image.
Figure 3: Context examples depending on the phase. The left image shows the four images given to the model during training. The left column is the source image, while the right represents the corresponding task. The gray blocks represent the mask the model aims to reconstruct. The right image shows the four images given at the test phase, where the top row represents the context and the bottom row represents the target image and task. The target task is empty, so the model aims to reconstruct it based on the context and the target source image.
Figure 4: Clean accuracy, SSIM, and PSNR of a trained Resnet-56 under different degrees of perturbed images. Note that for $\alpha = 0$, PSNR is $\infty$ due to the absence of perturbation, indicating perfect similarity to the original image. As $\alpha$ increases, a noticeable degradation in image quality and classification performance is observed.
Figure 5: Malicious context during testing for different tasks.
...and 1 more figures

Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

TL;DR

Abstract

Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (6)