When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs
Aobotao Dai, Xinyu Ma, Lei Chen, Songze Li, Lin Wang
TL;DR
This survey analyzes safety vulnerabilities in Vision-Language Models (VLMs) by developing a two-dimensional taxonomy based on attack goals and data-manipulation strategies. It identifies four data-manipulation categories—visual perturbation, gradient-driven prompts, human-like deceptive prompts, and typography—and situates jailbreak, camouflage, and exploitation attacks within this framework, while mapping defenses and evaluation metrics. Key contributions include a refined taxonomy that distinguishes jailbreak from camouflage, introduces camouflage and exploitation as distinct categories, and discusses evaluation paradigms (effectiveness, stealthiness, transferability, efficiency) alongside datasets and defenses. The work underscores practical implications for deploying robust LVLMs in real-world settings and outlines future directions, such as rethinking physical attacks, safety in embodied AI, and establishing comprehensive benchmarks to guide defense development.
Abstract
Vision-Language Models (VLMs) have gained considerable prominence in recent years due to their remarkable capability to effectively integrate and process both textual and visual information. This integration has significantly enhanced performance across a diverse spectrum of applications, such as scene perception and robotics. However, the deployment of VLMs has also given rise to critical safety and security concerns, necessitating extensive research to assess the potential vulnerabilities these VLM systems may harbor. In this work, we present an in-depth survey of the attack strategies tailored for VLMs. We categorize these attacks based on their underlying objectives - namely jailbreak, camouflage, and exploitation - while also detailing the various methodologies employed for data manipulation of VLMs. Meanwhile, we outline corresponding defense mechanisms that have been proposed to mitigate these vulnerabilities. By discerning key connections and distinctions among the diverse types of attacks, we propose a compelling taxonomy for VLM attacks. Moreover, we summarize the evaluation metrics that comprehensively describe the characteristics and impact of different attacks on VLMs. Finally, we conclude with a discussion of promising future research directions that could further enhance the robustness and safety of VLMs, emphasizing the importance of ongoing exploration in this critical area of study. To facilitate community engagement, we maintain an up-to-date project page, accessible at: https://github.com/AobtDai/VLM_Attack_Paper_List.
