Table of Contents
Fetching ...

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System

Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Fan Mo, Pietro Liò

TL;DR

This paper addresses the safety of medical vision-language systems that use Retrieval-Augmented Generation (RAG) by introducing MedThreatRAG, a framework that simulates realistic poisoning attacks in semi-open knowledge bases. It defines three attack modalities—Textual Attack, Visual Attack, and Cross-Modal Conflict Injection—that degrade retrieval and generation across the Med-LVLM stack, and demonstrates substantial performance deterioration on IU-Xray and MIMIC-CXR. Through detailed methodology, experiments, ablations, and case studies, the work highlights critical vulnerabilities even when attackers do not access model weights, and provides a practical set of guidelines for safe deployment, including provenance logging, clinician veto, and targeted defenses against textual, visual, and cross-modal threats. The findings underscore the urgency of threat-aware design in multimodal medical RAG systems and offer concrete, modular defenses to improve reliability in real-world clinical settings.

Abstract

Large Vision-Language Models (LVLMs) augmented with Retrieval-Augmented Generation (RAG) are increasingly employed in medical AI to enhance factual grounding through external clinical image-text retrieval. However, this reliance creates a significant attack surface. We propose MedThreatRAG, a novel multimodal poisoning framework that systematically probes vulnerabilities in medical RAG systems by injecting adversarial image-text pairs. A key innovation of our approach is the construction of a simulated semi-open attack environment, mimicking real-world medical systems that permit periodic knowledge base updates via user or pipeline contributions. Within this setting, we introduce and emphasize Cross-Modal Conflict Injection (CMCI), which embeds subtle semantic contradictions between medical images and their paired reports. These mismatches degrade retrieval and generation by disrupting cross-modal alignment while remaining sufficiently plausible to evade conventional filters. While basic textual and visual attacks are included for completeness, CMCI demonstrates the most severe degradation. Evaluations on IU-Xray and MIMIC-CXR QA tasks show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%. Our findings expose fundamental security gaps in clinical RAG systems and highlight the urgent need for threat-aware design and robust multimodal consistency checks. Finally, we conclude with a concise set of guidelines to inform the safe development of future multimodal medical RAG systems.

How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System

TL;DR

This paper addresses the safety of medical vision-language systems that use Retrieval-Augmented Generation (RAG) by introducing MedThreatRAG, a framework that simulates realistic poisoning attacks in semi-open knowledge bases. It defines three attack modalities—Textual Attack, Visual Attack, and Cross-Modal Conflict Injection—that degrade retrieval and generation across the Med-LVLM stack, and demonstrates substantial performance deterioration on IU-Xray and MIMIC-CXR. Through detailed methodology, experiments, ablations, and case studies, the work highlights critical vulnerabilities even when attackers do not access model weights, and provides a practical set of guidelines for safe deployment, including provenance logging, clinician veto, and targeted defenses against textual, visual, and cross-modal threats. The findings underscore the urgency of threat-aware design in multimodal medical RAG systems and offer concrete, modular defenses to improve reliability in real-world clinical settings.

Abstract

Large Vision-Language Models (LVLMs) augmented with Retrieval-Augmented Generation (RAG) are increasingly employed in medical AI to enhance factual grounding through external clinical image-text retrieval. However, this reliance creates a significant attack surface. We propose MedThreatRAG, a novel multimodal poisoning framework that systematically probes vulnerabilities in medical RAG systems by injecting adversarial image-text pairs. A key innovation of our approach is the construction of a simulated semi-open attack environment, mimicking real-world medical systems that permit periodic knowledge base updates via user or pipeline contributions. Within this setting, we introduce and emphasize Cross-Modal Conflict Injection (CMCI), which embeds subtle semantic contradictions between medical images and their paired reports. These mismatches degrade retrieval and generation by disrupting cross-modal alignment while remaining sufficiently plausible to evade conventional filters. While basic textual and visual attacks are included for completeness, CMCI demonstrates the most severe degradation. Evaluations on IU-Xray and MIMIC-CXR QA tasks show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%. Our findings expose fundamental security gaps in clinical RAG systems and highlight the urgent need for threat-aware design and robust multimodal consistency checks. Finally, we conclude with a concise set of guidelines to inform the safe development of future multimodal medical RAG systems.

Paper Structure

This paper contains 20 sections, 15 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of the multi-modal attack pipeline for medical visual question answering. The pipeline includes three attack types: (1) Textual Attack (TA) via negation-flip constraints that enforce incorrect responses, (2) Visual Attack (VA) using a diffusion model to generate synthetic X-ray images, and (3) Cross-Modal Conflict Injection that introduces semantic mismatches between visual and textual content. These perturbed elements populate a Malicious Knowledge Base, which is accessed by the Retriever to select top-$M$ candidates based on image-question similarity. The Med-LVLMs Reranker evaluates content relevance, and the top-$K$ results are forwarded to the Med-LVLMs Generator, which outputs misleading answers.
  • Figure 2: 3D visualization of text, ground-truth, and Malicious image embeddings. T-SNE projected plots show distinct clusters for each type, with adversarial images positioned closer to ground-truth images. This suggests adversarial manipulation shifts adversarial images near legitimate data, potentially leading to misclassifications or retrieval errors.
  • Figure 3: The illustrations of multi-modal attack vulnerabilities in lung and brain diagnostic systems.