From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang; Zhuohan Long; Zhihao Fan; Zhongyu Wei

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei

TL;DR

The paper surveys jailbreak research across LLMs and MLLMs, detailing evaluation benchmarks, attack modalities, and defense strategies. It clarifies that multimodal jailbreaking is less mature than unimodal work and highlights gaps in datasets, evaluation, and defense generalization. By categorizing non-parametric and parametric, unimodal and multimodal attacks, and contrasting extrinsic and intrinsic defenses, the work outlines concrete directions to enhance robustness of vision-language models. The findings emphasize the need for diverse multimodal benchmarks, resilient defense mechanisms, and ongoing alignment efforts to ensure safe deployment of advanced multimodal AI systems.

Abstract

The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

TL;DR

Abstract

Paper Structure (31 sections, 2 figures)

This paper contains 31 sections, 2 figures.

Introduction
Preliminary of Jailbreaking
Definition of Jailbreak Attack and Defense
Necessity of Jailbreak Attack and Defense
Why Jailbreak Attack Succeed
Evaluation Datasets for Jailbreaking
Unimodal Jailbreak Datasets
Multimodal Jailbreak Datasets
Limitations and Future Directions on Multimodal Jailbreak Datasets
Jailbreak Attack
Non-parametric Attack
Non-parametric Unimodal Attack
Constructing Competing Objectives
Inducing Mismatched Generalization
Non-parametric Multimodal Attack
...and 16 more sections

Figures (2)

Figure 1: The overall illustration of our investigation on jailbreaking from LLMs to MLLMs.
Figure 2: An example of jailbreak attack and defense.

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

TL;DR

Abstract

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Authors

TL;DR

Abstract

Table of Contents

Figures (2)