Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Xijie Huang; Xinyuan Wang; Hantao Zhang; Yinghao Zhu; Jiawen Xi; Jingkun An; Hao Wang; Hao Liang; Chengwei Pan

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan

TL;DR

The paper addresses safety gaps in Medical Multimodal LLMs (MedMLLMs) by formalizing clinical mismatches and malicious queries as $2M$-attack and $O2M$-attack, and by introducing the 3MAD dataset to benchmark vulnerabilities. It proposes Multimodal Cross-Optimization ($MCM$) to effectively jailbreak MedMLLMs through coordinated text and image perturbations, and demonstrates superior attack performance in white-box and transfer settings across four SOTA models. Key contributions include the 3MAD dataset (66K+ images, 18 modality-region combinations, 1,080 GPT-4-aided prompts, plus Tiny variant), the $MCM$ framework, and comprehensive evaluations showing MedMLLMs remain vulnerable despite security features like RLHF and system prompts. The work highlights urgent practical implications for deploying MedMLLMs in clinical environments and suggests concrete defenses to enhance safety for open-source medical multimodal systems.

Abstract

Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we define the mismatched malicious attack (2M-attack) and introduce its optimized version, known as the optimized mismatched malicious attack (O2M-attack or 2M-optimization). Using the voluminous 3MAD dataset that we construct, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and attack methods, including white-box attacks on LLaVA-Med and transfer attacks (black-box) on four other SOTA models, indicate that even MedMLLMs designed with enhanced security features remain vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. Our code is available at https://github.com/dirtycomputer/O2M_attack.

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

TL;DR

The paper addresses safety gaps in Medical Multimodal LLMs (MedMLLMs) by formalizing clinical mismatches and malicious queries as

-attack and

-attack, and by introducing the 3MAD dataset to benchmark vulnerabilities. It proposes Multimodal Cross-Optimization (

) to effectively jailbreak MedMLLMs through coordinated text and image perturbations, and demonstrates superior attack performance in white-box and transfer settings across four SOTA models. Key contributions include the 3MAD dataset (66K+ images, 18 modality-region combinations, 1,080 GPT-4-aided prompts, plus Tiny variant), the

framework, and comprehensive evaluations showing MedMLLMs remain vulnerable despite security features like RLHF and system prompts. The work highlights urgent practical implications for deploying MedMLLMs in clinical environments and suggests concrete defenses to enhance safety for open-source medical multimodal systems.

Abstract

Paper Structure (35 sections, 11 equations, 22 figures, 7 tables)

This paper contains 35 sections, 11 equations, 22 figures, 7 tables.

Introduction
Related Work
Development in MLLMs for the medical field.
Jailbreak and Adversarial Attacks against LLMs and MLLMs
Advanced Benchmarking in MLLMs and Medical Domains
Methodology
Threat Model
3MAD Dataset Construction
Multimodal Cross-optimization Method (MCM)
Experiments
Experimental Setups
Evaluation Metrics
Results and Analysis
Analysis of adversarial attack methods on LLaVA-Med
Analysis of transfer attack
...and 20 more sections

Figures (22)

Figure 1: (a): Common radiologist errors during diagnosis include mistaking MRI images for CT images. (b): The deviation in mismatched phenomena is more pronounced in medical datasets. (c): This indicates a significant semantic gap between medical and natural contexts, with mismatches in the medical field disrupting semantic coherence more severely.
Figure 2: (a): The potential mismatches or malicious actions in clinical settings. (b): For each malicious query, we match it with mismatched attributes to construct a 2M-attack. Additionally, we apply the jailbreak method to create an O2M-attack, aiming to deceive large multi-modal models into responding to queries that should not be answered.
Figure 3: Left: Components of images in the 3MAD (9 modalities and 12 body parts). Right: Components of normal prompts in the 3MAD (18 medical tasks or requirements).
Figure 4: Statistics of images in 3MAD-66K dataset.
Figure 5: Statistics on the distribution of image source regions and affiliated institutions in 3MAD dataset.
...and 17 more figures

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

TL;DR

Abstract

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (22)