TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu; Lu Pang; Tengfei Ma; Haibin Ling; Chao Chen

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen

TL;DR

This study introduces TrojVLM, the first exploration of backdoor attacks aimed at VLMs engaged in complex image-to-text generation and a novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.

Abstract

The emergence of Vision Language Models (VLMs) is a significant advancement in integrating computer vision with Large Language Models (LLMs) to produce detailed text descriptions based on visual inputs, yet it introduces new security vulnerabilities. Unlike prior work that centered on single modalities or classification tasks, this study introduces TrojVLM, the first exploration of backdoor attacks aimed at VLMs engaged in complex image-to-text generation. Specifically, TrojVLM inserts predetermined target text into output text when encountering poisoned images. Moreover, a novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content. Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of TrojVLM in maintaining original semantic content while triggering specific target text outputs. This study not only uncovers a critical security risk in VLMs and image-to-text generation but also sets a foundation for future research on securing multimodal models against such sophisticated threats.

TrojVLM: Backdoor Attack Against Vision Language Models

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 6 figures, 8 tables)

This paper contains 16 sections, 3 equations, 6 figures, 8 tables.

Introduction
Related Work
Methodology
Problem Definition
TrojVLM
Experiments
Experimental Settings
Attack Efficiency
Interaction between Visual and Textual Information
Ablation Study
Conclusion
Ethics Statement
Limitations
Evaluation Metric
Interaction between Visual and Textual Information
...and 1 more sections

Figures (6)

Figure 1: In a), we illustrate examples of backdoor attack against VLM in image captioning and VQA tasks. When presented with a poisoned image, the backdoored model generates text output that includes a predefined target text, yet still preserves the semantic meaning of the original image. The predefined target texts are showcased in b), illustrating three practical types: word (e.g., 'banana'), sentence (e.g., 'i have successfully attacked this model, lol'), and website (e.g., 'www.attacksuccessfully.com').
Figure 2: TrojVLM backdoor injection in image-to-text generation. Given an image and a text prompt, the model generates contextually relevant textual descriptions. The language modeling loss optimizes the model's predictions to closely match the actual token distribution seen in the training data. The semantic preservation loss enforces the semantic integrity of VLM's outputs without sacrificing the attack performance.
Figure 3: During backdoor training, solely relying on LM loss may cause the model to neglect the semantic content of the original image, resulting in outputs like the nonsensical phrase 'eating a spoon' or repetition of the target text. The quantitative results are shown in Table \ref{['tab:loss_function']}.
Figure 4: Attention maps on the adaptor's last projection layer, revealing that various projection tokens retain distinct pieces of visual information. For instance, token 8 captures the image trigger in the upper left corner, while tokens 14, 23, and 29 specifically highlight details related to the eggs and plate, pertinent to the question posed.
Figure 5: Evaluating the sensitivity of backdoor attacks to various image trigger types: black, red, white, and three levels of invisible noise patterns (noise1 with std=5, noise2 with std=10, and noise3 with std=20). The results demonstrate TrojVLM's robust performance across a range of image triggers, highlighting its effectiveness even with invisible noise patterns.
...and 1 more figures

TrojVLM: Backdoor Attack Against Vision Language Models

TL;DR

Abstract

TrojVLM: Backdoor Attack Against Vision Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)