Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Yuancheng Xu; Jiarui Yao; Manli Shu; Yanchao Sun; Zichu Wu; Ning Yu; Tom Goldstein; Furong Huang

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang

TL;DR

Shadowcast reveals a critical vulnerability in Vision-Language Models by introducing stealthy data poisoning that uses visually congruent image/text pairs to manipulate outputs to benign prompts. It formalizes two destination concepts—Label Attack and Persuasion Attack—and demonstrates that as few as 50 poison samples can induce targeted misinterpretations while preserving model utility. The attack transfers across architectures and remains effective under common training-time augmentations and JPEG compression, underscoring real-world risk and the need for data-sourcing safeguards. The work motivates development of defenses and data curation practices to ensure safer deployment of VLMs.

Abstract

Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, but their versatility raises security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is a traditional Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is a novel Persuasion Attack, leveraging VLMs' text generation capabilities to craft persuasive and seemingly rational narratives for misinformation, such as portraying junk food as healthy. We show that Shadowcast effectively achieves the attacker's intentions using as few as 50 poison samples. Crucially, the poisoned samples demonstrate transferability across different VLM architectures, posing a significant concern in black-box settings. Moreover, Shadowcast remains potent under realistic conditions involving various text prompts, training data augmentation, and image compression techniques. This work reveals how poisoned VLMs can disseminate convincing yet deceptive misinformation to everyday, benign users, emphasizing the importance of data integrity for responsible VLM deployments. Our code is available at: https://github.com/umd-huang-lab/VLM-Poisoning.

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 18 figures, 12 tables)

This paper contains 20 sections, 1 equation, 18 figures, 12 tables.

Introduction
Related work
Method
Threat model
Overview of Shadowcast
Crafting the texts
Crafting the poison images
Experiments
Experimental setup
Attack effectiveness on Label Attack
Attack effectiveness on Persuasion Attack
Attack generalizability
Robustness of the attack
Conclusions and discussions
Task data
...and 5 more sections

Figures (18)

Figure 1: Responses of the clean and poisoned LLaVA-1.5 models in a traditional Label Attack (top) and a novel Persuasion Attack task (bottom), with poisoned samples crafted using a different VLM, MiniGPT-v2.
Figure 2: Illustration of Shadowcast crafting a poison sample with visually matching image and text.
Figure 3: Attack success rate of Label Attack for LLaVA-1.5.
Figure 4: Attack success rate of Persuasion Attack for LLaVA-1.5.
Figure 6: (Generalizability across prompts) Attack success rates when diverse prompts are used.
...and 13 more figures

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

TL;DR

Abstract

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)