A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Daizong Liu; Mingyu Yang; Xiaoye Qu; Pan Zhou; Yu Cheng; Wei Hu

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

TL;DR

The paper surveys security vulnerabilities of LVLMs, focusing on four attack classes: adversarial perturbations, jailbreaks, prompt injections, and data poisoning/backdoors. It synthesizes notations, formulations, and resources (datasets, models, tools, defenses) and provides a taxonomy with strengths and limitations. It discusses practical considerations, transferability, biases, and human-in-the-loop aspects. The review aims to guide robust defenses and standardized benchmarks for safer LVLM deployment.

Abstract

With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising research directions in the future. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments. The latest papers on LVLM attacks are continuously collected in https://github.com/liudaizong/Awesome-LVLM-Attack.

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

TL;DR

Abstract

Paper Structure (37 sections, 2 equations, 3 figures, 6 tables)

This paper contains 37 sections, 2 equations, 3 figures, 6 tables.

Introduction
Background
Preliminary of LVLM Attack
Notations and Definitions
Attack Formulation
The challenges of LVLM Attack
Current Resources for LVLM Attack
White-box attack tools
Gray-box attack tools
Black-box attack tools
Datasets
LVLM models
Evaluation metrics
Defense strategies
Methods
...and 22 more sections

Figures (3)

Figure 1: Overview of existing attack methods on LVLMs. LVLM attackers generally manipulate prompts (visual or textual) to control the LVLM's inference, producing specific or malicious outputs, or achieving a jailbreak. For example, in a backdoor attack, poisoning data is mixed in during the model training stage to embed a trigger for subsequent attacks. Similarly, adversarial and jailbreak attacks utilize the model's gradient information from back-propagation to optimize the attack.
Figure 2: The taxonomy of existing attack methods on LVLMs. We categorize attacks into four types: adversarial attacks, jailbreak attacks, prompt injection attacks, and data poisoning/backdoor attacks. Additionally, each category is further divided into subclasses based on the methods of implementation, with each branch listing the works associated with that category.
Figure 3: Detailed illustration of the four types of LVLM attacks. Specifically, adversarial attacks aim to perturb the input samples via adversarial learning to mislead the LVLM models; jailbreak attacks exploit weaknesses in the model to bypass its intended restrictions, potentially leading to the execution of unauthorized commands or access to sensitive information; prompt injection attacks engineer the prompts to alter its behavior or outputs in unintended ways, which can be particularly dangerous in systems that rely on precise and accurate responses; data poisoning/backdoor attacks tend to tamper the training data to undermine the model’s performance and reliability.

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

TL;DR

Abstract

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Authors

TL;DR

Abstract

Table of Contents

Figures (3)