Table of Contents
Fetching ...

A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

Mang Ye, Xuankun Rong, Wenke Huang, Bo Du, Nenghai Yu, Dacheng Tao

TL;DR

This survey addresses the safety of Large Vision-Language Models (LVLMs) by synthesizing attacks, defenses, and evaluations within a lifecycle framework that separates inference-time and training-time considerations. It formalizes access capabilities, attack objectives, and strategies, then categorizes a broad spectrum of attacks (white-box, gray-box, black-box; label poisoning; backdoors) and corresponding defenses (input sanitation, internal optimization, output validation, and multi-stage integration; data-driven and strategy-driven training defenses). The paper also surveys evaluation methodologies and benchmarks, and provides an in-depth safety assessment of the latest LVLM, DeepSeek Janus-Pro, identifying significant safety gaps and guiding future directions toward robust, cross-modal safety and reliable deployment in high-stakes settings. Overall, the work offers a centralized resource with a public repository to accelerate research on LVLM safety, including practical recommendations and standardized evaluation frameworks.

Abstract

With the rapid advancement of Large Vision-Language Models (LVLMs), ensuring their safety has emerged as a crucial area of research. This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs and the corresponding mitigation strategies. Through an analysis of the LVLM lifecycle, we introduce a classification framework that distinguishes between inference and training phases, with further subcategories to provide deeper insights. Furthermore, we highlight limitations in existing research and outline future directions aimed at strengthening the robustness of LVLMs. As part of our research, we conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results. Our findings provide strategic recommendations for advancing LVLM safety and ensuring their secure and reliable deployment in high-stakes, real-world applications. This survey aims to serve as a cornerstone for future research, facilitating the development of models that not only push the boundaries of multimodal intelligence but also adhere to the highest standards of security and ethical integrity. Furthermore, to aid the growing research in this field, we have created a public repository to continuously compile and update the latest work on LVLM safety: https://github.com/XuankunRong/Awesome-LVLM-Safety .

A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

TL;DR

This survey addresses the safety of Large Vision-Language Models (LVLMs) by synthesizing attacks, defenses, and evaluations within a lifecycle framework that separates inference-time and training-time considerations. It formalizes access capabilities, attack objectives, and strategies, then categorizes a broad spectrum of attacks (white-box, gray-box, black-box; label poisoning; backdoors) and corresponding defenses (input sanitation, internal optimization, output validation, and multi-stage integration; data-driven and strategy-driven training defenses). The paper also surveys evaluation methodologies and benchmarks, and provides an in-depth safety assessment of the latest LVLM, DeepSeek Janus-Pro, identifying significant safety gaps and guiding future directions toward robust, cross-modal safety and reliable deployment in high-stakes settings. Overall, the work offers a centralized resource with a public repository to accelerate research on LVLM safety, including practical recommendations and standardized evaluation frameworks.

Abstract

With the rapid advancement of Large Vision-Language Models (LVLMs), ensuring their safety has emerged as a crucial area of research. This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs and the corresponding mitigation strategies. Through an analysis of the LVLM lifecycle, we introduce a classification framework that distinguishes between inference and training phases, with further subcategories to provide deeper insights. Furthermore, we highlight limitations in existing research and outline future directions aimed at strengthening the robustness of LVLMs. As part of our research, we conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results. Our findings provide strategic recommendations for advancing LVLM safety and ensuring their secure and reliable deployment in high-stakes, real-world applications. This survey aims to serve as a cornerstone for future research, facilitating the development of models that not only push the boundaries of multimodal intelligence but also adhere to the highest standards of security and ethical integrity. Furthermore, to aid the growing research in this field, we have created a public repository to continuously compile and update the latest work on LVLM safety: https://github.com/XuankunRong/Awesome-LVLM-Safety .

Paper Structure

This paper contains 41 sections, 16 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Overview of the survey. Best viewed in color.
  • Figure 2: Illustration of Inference-Phase Attack Methods. Detailed explanations can be found in \ref{['sec: White-box Attacks']} for White-Box Attacks, \ref{['sec: Gray-box Attacks']} for Gray-Box Attacks, and \ref{['sec: Black-box Attacks']} for Black-Box Attacks.