Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything
Xiaotian Zou, Ke Li, Yongkang Chen
TL;DR
This work investigates image-to-text jailbreak risks in multimodal AI by introducing Flow-JD, a dataset suite consisting of Flow-HJD and Flow-SJD that test logical flowchart understanding for jailbreak. The authors evaluate multiple SOTA VLMs, measuring Attack Success Rate ($ASR$) and using RoBERTa-based judgments to quantify jailbreak success, finding high vulnerability in both closed- and open-source systems (e.g., GPT-4o reaching up to $92.8\%$ ASR on Flow-HJD). They further analyze correlations between model outputs and harmful behaviors, highlighting the influence of flowchart content and prompts on jailbreak outcomes. The results underscore urgent safety needs and point to dataset gaps, suggesting directions for more robust defenses and evaluation protocols in multimodal safety research.
Abstract
Large Visual Language Model\textbfs (VLMs) such as GPT-4V have achieved remarkable success in generating comprehensive and nuanced responses. Researchers have proposed various benchmarks for evaluating the capabilities of VLMs. With the integration of visual and text inputs in VLMs, new security issues emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulnerabilities of VLMs to jailbreak. Most existing research focuses on generating adversarial images or nonsensical image to jailbreak these models. However, no researchers evaluate whether logic understanding capabilities of VLMs in flowchart can influence jailbreak. Therefore, to fill this gap, this paper first introduces a novel dataset Flow-JD specifically designed to evaluate the logic-based flowchart jailbreak capabilities of VLMs. We conduct an extensive evaluation on GPT-4o, GPT-4V, other 5 SOTA open source VLMs and the jailbreak rate is up to 92.8%. Our research reveals significant vulnerabilities in current VLMs concerning image-to-text jailbreak and these findings underscore the the urgency for the development of robust and effective future defenses.
