Table of Contents
Fetching ...

Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation

Yingjia Shang, Yi Liu, Huimin Wang, Furong Li, Wenfang Sun, Wu Chengyu, Yefeng Zheng

TL;DR

Medusa presents a black-box, cross-modal transferable adversarial framework for multimodal medical retrieval-augmented generation systems, leveraging a cross-modal misalignment loss $L_{\text{MPIL}}$, surrogate ensembles, IRM, and a dual-loop optimization to achieve high attack transferability. It demonstrates over 90% attack success across pneumonia and edema tasks and multiple retrievers/generators, while remaining robust against common input defenses. The work reveals substantial safety concerns in MMed-RAG deployments and calls for robust defense benchmarks and safety-focused evaluation in high-stakes medical AI. Overall, Medusa advances understanding of cross-modal vulnerabilities and provides a rigorous methodology for assessing and benchmarking robustness in medical VLM-based retrieval-augmented systems.

Abstract

With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision support. These systems enhance medical applications by performing cross-modal retrieval to integrate relevant visual and textual evidence for tasks, e.g., report generation and disease diagnosis. However, their complex architecture also introduces underexplored adversarial vulnerabilities, particularly via visual input perturbations. In this paper, we propose Medusa, a novel framework for crafting cross-modal transferable adversarial attacks on MMed-RAG systems under a black-box setting. Specifically, Medusa formulates the attack as a perturbation optimization problem, leveraging a multi-positive InfoNCE loss (MPIL) to align adversarial visual embeddings with medically plausible but malicious textual targets, thereby hijacking the retrieval process. To enhance transferability, we adopt a surrogate model ensemble and design a dual-loop optimization strategy augmented with invariant risk minimization (IRM). Extensive experiments on two real-world medical tasks, including medical report generation and disease diagnosis, demonstrate that Medusa achieves over 90% average attack success rate across various generation models and retrievers under appropriate parameter configuration, while remaining robust against four mainstream defenses, outperforming state-of-the-art baselines. Our results reveal critical vulnerabilities in the MMed-RAG systems and highlight the necessity of robustness benchmarking in safety-critical medical applications. The code and data are available at https://anonymous.4open.science/r/MMed-RAG-Attack-F05A.

Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation

TL;DR

Medusa presents a black-box, cross-modal transferable adversarial framework for multimodal medical retrieval-augmented generation systems, leveraging a cross-modal misalignment loss , surrogate ensembles, IRM, and a dual-loop optimization to achieve high attack transferability. It demonstrates over 90% attack success across pneumonia and edema tasks and multiple retrievers/generators, while remaining robust against common input defenses. The work reveals substantial safety concerns in MMed-RAG deployments and calls for robust defense benchmarks and safety-focused evaluation in high-stakes medical AI. Overall, Medusa advances understanding of cross-modal vulnerabilities and provides a rigorous methodology for assessing and benchmarking robustness in medical VLM-based retrieval-augmented systems.

Abstract

With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision support. These systems enhance medical applications by performing cross-modal retrieval to integrate relevant visual and textual evidence for tasks, e.g., report generation and disease diagnosis. However, their complex architecture also introduces underexplored adversarial vulnerabilities, particularly via visual input perturbations. In this paper, we propose Medusa, a novel framework for crafting cross-modal transferable adversarial attacks on MMed-RAG systems under a black-box setting. Specifically, Medusa formulates the attack as a perturbation optimization problem, leveraging a multi-positive InfoNCE loss (MPIL) to align adversarial visual embeddings with medically plausible but malicious textual targets, thereby hijacking the retrieval process. To enhance transferability, we adopt a surrogate model ensemble and design a dual-loop optimization strategy augmented with invariant risk minimization (IRM). Extensive experiments on two real-world medical tasks, including medical report generation and disease diagnosis, demonstrate that Medusa achieves over 90% average attack success rate across various generation models and retrievers under appropriate parameter configuration, while remaining robust against four mainstream defenses, outperforming state-of-the-art baselines. Our results reveal critical vulnerabilities in the MMed-RAG systems and highlight the necessity of robustness benchmarking in safety-critical medical applications. The code and data are available at https://anonymous.4open.science/r/MMed-RAG-Attack-F05A.

Paper Structure

This paper contains 24 sections, 17 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Workflow of the MMed-RAG system.
  • Figure 2: Overview of the proposed Medusa attack, which includes (a) cross-modal misalignment, (b) transferability enhancement, and (c) dual-loop optimization.
  • Figure 3: Retrieval misleading performance under different $k$ ($\epsilon = 2/255$).
  • Figure 4: Confusion matrix between DeepSeek predictions and human annotations
  • Figure 5: Retrieval misleading performance under different $\epsilon$ and different $k$ of ENS liu2017delving, SVRE xiong2022stochastic, and the proposed Medusa.
  • ...and 7 more figures