Table of Contents
Fetching ...

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu

TL;DR

The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

TL;DR

The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).
Paper Structure (43 sections, 11 equations, 18 figures, 12 tables, 1 algorithm)

This paper contains 43 sections, 11 equations, 18 figures, 12 tables, 1 algorithm.

Figures (18)

  • Figure 1: Overview. (a) Compared to text-only (T) and text+image (T+I) attacks that are blocked by safety filters, our proposed MIDAS leverages Game-based Visual Reasoning (GVR) to bypass defenses and induce harmful outputs. (b) Examples of visual reasoning puzzles used in our MIDAS. (c) Our proposed MIDAS achieves significantly higher Attack Success Rate (ASR) and Harmfulness Rating (HR) than other baselines.
  • Figure 2: Pipeline of MIDAS. (1) Text Process: extract risk-bearing units, decompose them into subunits, and replace them with placeholders; (2) Image Process: embed the subunits into multiple benign-looking puzzle images that enforce step-by-step reasoning; (3) Model Output: the model decodes puzzle fragments, reconstructs the hidden semantics, and generates harmful responses under persona-driven reasoning guidance.
  • Figure 3: Hyper-parameter sensitivity analysis. ASR and HR under different hyper-parameter settings: (a) varying the number of decomposed keywords $k$, and (b) varying the number of reasoning images $H$.
  • Figure 4: Evaluation using safety mechanisms. (a) Evaluation results using LlamaGuard, showing the percentage of safe versus unsafe responses for the original model, HIMRD, and our proposed method. (b) Attack Success Rate (ASR) across different defense strategies (VisCRA and MIDAS). The annotated values indicate the relative ASR drop compared to the baseline (No Defense).
  • Figure 5: Case study of MIDAS: a dispersed harmful query evades safety detection and is progressively reconstructed through cross-modal reasoning into a harmful output.
  • ...and 13 more figures