Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

Xiaotian Zou; Ke Li; Yongkang Chen

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

Xiaotian Zou, Ke Li, Yongkang Chen

TL;DR

This work investigates image-to-text jailbreak risks in multimodal AI by introducing Flow-JD, a dataset suite consisting of Flow-HJD and Flow-SJD that test logical flowchart understanding for jailbreak. The authors evaluate multiple SOTA VLMs, measuring Attack Success Rate ($ASR$) and using RoBERTa-based judgments to quantify jailbreak success, finding high vulnerability in both closed- and open-source systems (e.g., GPT-4o reaching up to $92.8\%$ ASR on Flow-HJD). They further analyze correlations between model outputs and harmful behaviors, highlighting the influence of flowchart content and prompts on jailbreak outcomes. The results underscore urgent safety needs and point to dataset gaps, suggesting directions for more robust defenses and evaluation protocols in multimodal safety research.

Abstract

Large Visual Language Model\textbfs (VLMs) such as GPT-4V have achieved remarkable success in generating comprehensive and nuanced responses. Researchers have proposed various benchmarks for evaluating the capabilities of VLMs. With the integration of visual and text inputs in VLMs, new security issues emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulnerabilities of VLMs to jailbreak. Most existing research focuses on generating adversarial images or nonsensical image to jailbreak these models. However, no researchers evaluate whether logic understanding capabilities of VLMs in flowchart can influence jailbreak. Therefore, to fill this gap, this paper first introduces a novel dataset Flow-JD specifically designed to evaluate the logic-based flowchart jailbreak capabilities of VLMs. We conduct an extensive evaluation on GPT-4o, GPT-4V, other 5 SOTA open source VLMs and the jailbreak rate is up to 92.8%. Our research reveals significant vulnerabilities in current VLMs concerning image-to-text jailbreak and these findings underscore the the urgency for the development of robust and effective future defenses.

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

TL;DR

) and using RoBERTa-based judgments to quantify jailbreak success, finding high vulnerability in both closed- and open-source systems (e.g., GPT-4o reaching up to

ASR on Flow-HJD). They further analyze correlations between model outputs and harmful behaviors, highlighting the influence of flowchart content and prompts on jailbreak outcomes. The results underscore urgent safety needs and point to dataset gaps, suggesting directions for more robust defenses and evaluation protocols in multimodal safety research.

Abstract

Paper Structure (10 sections, 8 figures, 2 tables)

This paper contains 10 sections, 8 figures, 2 tables.

Introduction
Related Work
Models and Datasets
Evaluation
Metrics
Experiments Setting
Results
Conclusion
Future Work
Appendix

Figures (8)

Figure 1: An example of logic flowchart jailbreak in GPT-4o.
Figure 2: The correlation score between the responses and harmful behaviours.
Figure 3: The similarity of jailbreak flowcharts with corresponding text.
Figure 4: A logic jailbreak example of Flow-HJD in GPT-4o.
Figure 5: A logic jailbreak example of Flow-SJD in GPT-4o.
...and 3 more figures

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

TL;DR

Abstract

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

Authors

TL;DR

Abstract

Table of Contents

Figures (8)