Table of Contents
Fetching ...

When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs

Aobotao Dai, Xinyu Ma, Lei Chen, Songze Li, Lin Wang

TL;DR

This survey analyzes safety vulnerabilities in Vision-Language Models (VLMs) by developing a two-dimensional taxonomy based on attack goals and data-manipulation strategies. It identifies four data-manipulation categories—visual perturbation, gradient-driven prompts, human-like deceptive prompts, and typography—and situates jailbreak, camouflage, and exploitation attacks within this framework, while mapping defenses and evaluation metrics. Key contributions include a refined taxonomy that distinguishes jailbreak from camouflage, introduces camouflage and exploitation as distinct categories, and discusses evaluation paradigms (effectiveness, stealthiness, transferability, efficiency) alongside datasets and defenses. The work underscores practical implications for deploying robust LVLMs in real-world settings and outlines future directions, such as rethinking physical attacks, safety in embodied AI, and establishing comprehensive benchmarks to guide defense development.

Abstract

Vision-Language Models (VLMs) have gained considerable prominence in recent years due to their remarkable capability to effectively integrate and process both textual and visual information. This integration has significantly enhanced performance across a diverse spectrum of applications, such as scene perception and robotics. However, the deployment of VLMs has also given rise to critical safety and security concerns, necessitating extensive research to assess the potential vulnerabilities these VLM systems may harbor. In this work, we present an in-depth survey of the attack strategies tailored for VLMs. We categorize these attacks based on their underlying objectives - namely jailbreak, camouflage, and exploitation - while also detailing the various methodologies employed for data manipulation of VLMs. Meanwhile, we outline corresponding defense mechanisms that have been proposed to mitigate these vulnerabilities. By discerning key connections and distinctions among the diverse types of attacks, we propose a compelling taxonomy for VLM attacks. Moreover, we summarize the evaluation metrics that comprehensively describe the characteristics and impact of different attacks on VLMs. Finally, we conclude with a discussion of promising future research directions that could further enhance the robustness and safety of VLMs, emphasizing the importance of ongoing exploration in this critical area of study. To facilitate community engagement, we maintain an up-to-date project page, accessible at: https://github.com/AobtDai/VLM_Attack_Paper_List.

When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs

TL;DR

This survey analyzes safety vulnerabilities in Vision-Language Models (VLMs) by developing a two-dimensional taxonomy based on attack goals and data-manipulation strategies. It identifies four data-manipulation categories—visual perturbation, gradient-driven prompts, human-like deceptive prompts, and typography—and situates jailbreak, camouflage, and exploitation attacks within this framework, while mapping defenses and evaluation metrics. Key contributions include a refined taxonomy that distinguishes jailbreak from camouflage, introduces camouflage and exploitation as distinct categories, and discusses evaluation paradigms (effectiveness, stealthiness, transferability, efficiency) alongside datasets and defenses. The work underscores practical implications for deploying robust LVLMs in real-world settings and outlines future directions, such as rethinking physical attacks, safety in embodied AI, and establishing comprehensive benchmarks to guide defense development.

Abstract

Vision-Language Models (VLMs) have gained considerable prominence in recent years due to their remarkable capability to effectively integrate and process both textual and visual information. This integration has significantly enhanced performance across a diverse spectrum of applications, such as scene perception and robotics. However, the deployment of VLMs has also given rise to critical safety and security concerns, necessitating extensive research to assess the potential vulnerabilities these VLM systems may harbor. In this work, we present an in-depth survey of the attack strategies tailored for VLMs. We categorize these attacks based on their underlying objectives - namely jailbreak, camouflage, and exploitation - while also detailing the various methodologies employed for data manipulation of VLMs. Meanwhile, we outline corresponding defense mechanisms that have been proposed to mitigate these vulnerabilities. By discerning key connections and distinctions among the diverse types of attacks, we propose a compelling taxonomy for VLM attacks. Moreover, we summarize the evaluation metrics that comprehensively describe the characteristics and impact of different attacks on VLMs. Finally, we conclude with a discussion of promising future research directions that could further enhance the robustness and safety of VLMs, emphasizing the importance of ongoing exploration in this critical area of study. To facilitate community engagement, we maintain an up-to-date project page, accessible at: https://github.com/AobtDai/VLM_Attack_Paper_List.

Paper Structure

This paper contains 27 sections, 9 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Illustration of attacks on VLMs, where tailored data manipulation strategies for different attack goals are employed for VLMs, inducing various kinds of malicious outputs. For each of them, the representative methods and outcomes from zhao2024evaluatingnips (Visual Perturbation), yang2024mmacvpr (Gradient-Driven Prompts), wu2024jailbreakinggpt4vselfadversarialattacks (Human-Like Deceptive Prompts), ma2024visualroleplayuniversaljailbreakattack (Typography), yang2024mmacvpr (Jailbreak Attack), ni2024physicalbackdoorattackjeopardize (Camouflage Attack) and gao2024inducinghighenergylatencylarge (Exploitation Attack), are highlighted.
  • Figure 2: Google Scholar search results for VLM attacks, with the vertical axis representing the number of publications and the horizontal axis indicating the corresponding years.
  • Figure 3: An overview of the general model architecture of VLMs. In certain configurations of VLMs, the input text is transmitted directly to the projector without an intermediate encoding step. In LVLMs, LLMs are employed as decoders to facilitate the processing of multimodal input.
  • Figure 4: Illustration of attack framework specific for VLMs and LVLMs, encompassing three key aspects: 1) the goals of VLM attacks, 2) the data manipulation strategies specialized to VLMs, and 3) the evaluation methods used to assess the attack.
  • Figure 5: Schematic illustration of the jailbreak attack, summarized from the representative works in Table \ref{['jailbreak works']}. The crafted input is designed to circumvent defense mechanisms, represented by a shield icon, enabling the generation of malicious content.
  • ...and 8 more figures