Table of Contents
Fetching ...

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

TL;DR

This paper addresses safety concerns in vision-language models (VLMs) by examining text prompt injection, where prompts are embedded in images to steer model outputs. It proposes a systematic injection algorithm that perturbs image pixels within an $l_{ty}$ budget to embed prompts in regions of high color consistency, optimizing the chance that the VLM follows the malicious instruction. Experiments on the Llava-Next-72B model using the Oxford-IIIT Pet dataset show that the proposed method substantially increases both untargeted and targeted attack success rates, outperforming gradient-based transfer attacks, especially at higher budgets and with multiple prompt repeats. The study highlights a practical, resource-efficient vulnerability in large VLMs and underscores the need for defenses tailored to multi-modal prompt manipulation, while noting heuristic limitations and directions for future improvement in prompt arrangement and robustness. The findings have significant implications for real-world deployments of VLMs, where covert prompt manipulation could mislead model behavior without obvious human perceptibility.

Abstract

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.

Text Prompt Injection of Vision Language Models

TL;DR

This paper addresses safety concerns in vision-language models (VLMs) by examining text prompt injection, where prompts are embedded in images to steer model outputs. It proposes a systematic injection algorithm that perturbs image pixels within an budget to embed prompts in regions of high color consistency, optimizing the chance that the VLM follows the malicious instruction. Experiments on the Llava-Next-72B model using the Oxford-IIIT Pet dataset show that the proposed method substantially increases both untargeted and targeted attack success rates, outperforming gradient-based transfer attacks, especially at higher budgets and with multiple prompt repeats. The study highlights a practical, resource-efficient vulnerability in large VLMs and underscores the need for defenses tailored to multi-modal prompt manipulation, while noting heuristic limitations and directions for future improvement in prompt arrangement and robustness. The findings have significant implications for real-world deployments of VLMs, where covert prompt manipulation could mislead model behavior without obvious human perceptibility.

Abstract

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.

Paper Structure

This paper contains 18 sections, 4 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Images for different tasks. The first image for trivial task is the original tiger image without any change. The second image is for easy task while the third image is for hard task. They are text prompt injected based on the first image but $l_{\infty}$ constraint is not applied.
  • Figure 2: Original and Injected Images. The left one is the original image while the right one is the injected image. The $l_\infty$ constraint for the injected image is 8/255. It's really hard to notice the injected text prompt on the middle-up of the image. However, the VLM will answer this is a Samoyed instead of a Wheaten Terrier, following the prompt. It shows our injection is both covert and effective.