Table of Contents
Fetching ...

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

Badhan Chandra Das, Md Tasnim Jawad, Md Jueal Mia, M. Hadi Amini, Yanzhao Wu

TL;DR

This work investigates the security of Large Vision Language Models embedded in Intelligent Transportation Systems by introducing a typography-based jailbreaking attack that leverages multi-turn prompting. A transportation-focused harmful-query dataset is constructed, and a novel attack pipeline blends adversarial captions into images to induce unsafe outputs, which are evaluated against open-source and closed-source LVLMs. A two-layer defense combining pattern-based filtering and a zero-shot classifier significantly reduces attack success, with GPT-4 toxicity scoring and manual checks validating the findings. The results highlight substantial vulnerabilities in ITS LVLM deployments and emphasize the need for stronger, multimodal defenses to ensure safety in real-world transportation scenarios.

Abstract

Large Vision Language Models (LVLMs) demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vulnerable to jailbreaking attacks. This paper systematically analyzes the vulnerabilities of LVLMs integrated in Intelligent Transportation Systems (ITS) under carefully crafted jailbreaking attacks. First, we carefully construct a dataset with harmful queries relevant to transportation, following OpenAI's prohibited categories to which the LVLMs should not respond. Second, we introduce a novel jailbreaking attack that exploits the vulnerabilities of LVLMs through image typography manipulation and multi-turn prompting. Third, we propose a multi-layered response filtering defense technique to prevent the model from generating inappropriate responses. We perform extensive experiments with the proposed attack and defense on the state-of-the-art LVLMs (both open-source and closed-source). To evaluate the attack method and defense technique, we use GPT-4's judgment to determine the toxicity score of the generated responses, as well as manual verification. Further, we compare our proposed jailbreaking method with existing jailbreaking techniques and highlight severe security risks involved with jailbreaking attacks with image typography manipulation and multi-turn prompting in the LVLMs integrated in ITS.

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

TL;DR

This work investigates the security of Large Vision Language Models embedded in Intelligent Transportation Systems by introducing a typography-based jailbreaking attack that leverages multi-turn prompting. A transportation-focused harmful-query dataset is constructed, and a novel attack pipeline blends adversarial captions into images to induce unsafe outputs, which are evaluated against open-source and closed-source LVLMs. A two-layer defense combining pattern-based filtering and a zero-shot classifier significantly reduces attack success, with GPT-4 toxicity scoring and manual checks validating the findings. The results highlight substantial vulnerabilities in ITS LVLM deployments and emphasize the need for stronger, multimodal defenses to ensure safety in real-world transportation scenarios.

Abstract

Large Vision Language Models (LVLMs) demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vulnerable to jailbreaking attacks. This paper systematically analyzes the vulnerabilities of LVLMs integrated in Intelligent Transportation Systems (ITS) under carefully crafted jailbreaking attacks. First, we carefully construct a dataset with harmful queries relevant to transportation, following OpenAI's prohibited categories to which the LVLMs should not respond. Second, we introduce a novel jailbreaking attack that exploits the vulnerabilities of LVLMs through image typography manipulation and multi-turn prompting. Third, we propose a multi-layered response filtering defense technique to prevent the model from generating inappropriate responses. We perform extensive experiments with the proposed attack and defense on the state-of-the-art LVLMs (both open-source and closed-source). To evaluate the attack method and defense technique, we use GPT-4's judgment to determine the toxicity score of the generated responses, as well as manual verification. Further, we compare our proposed jailbreaking method with existing jailbreaking techniques and highlight severe security risks involved with jailbreaking attacks with image typography manipulation and multi-turn prompting in the LVLMs integrated in ITS.

Paper Structure

This paper contains 15 sections, 5 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of the Proposed Method for Jailbreaking LVLMs with Typography Manipulation and Multi-turn Prompting [Q: Query, R: Response]