Table of Contents
Fetching ...

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng Ji

TL;DR

MM-PoisonRAG presents the first systematic study of knowledge poisoning in multimodal Retrieval-Augmented Generation, introducing Localized Poisoning Attack (LPA) and Globalized Poisoning Attack (GPA) to evaluate targeted and universal threats. Through extensive experiments on MMQA and WebQA with diverse retrievers and MLLMs, LPA achieves attacker-controlled outputs up to roughly 56% success, while GPA can drive end-to-end accuracy to near 0% with a single poisoned entry, exposing severe fragilities in multimodal RAG. Attacks transfer across retrievers and remain potent under black-box conditions, and paraphrase-based defenses fail to mitigate them, underscoring the need for cross-modal, modality-aware defenses. The work highlights practical risks in relying on external multimodal KBs for grounding and motivates robust retrieval verification and cross-modal consistency checks to safeguard multimodal RAG systems.

Abstract

Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

TL;DR

MM-PoisonRAG presents the first systematic study of knowledge poisoning in multimodal Retrieval-Augmented Generation, introducing Localized Poisoning Attack (LPA) and Globalized Poisoning Attack (GPA) to evaluate targeted and universal threats. Through extensive experiments on MMQA and WebQA with diverse retrievers and MLLMs, LPA achieves attacker-controlled outputs up to roughly 56% success, while GPA can drive end-to-end accuracy to near 0% with a single poisoned entry, exposing severe fragilities in multimodal RAG. Attacks transfer across retrievers and remain potent under black-box conditions, and paraphrase-based defenses fail to mitigate them, underscoring the need for cross-modal, modality-aware defenses. The work highlights practical risks in relying on external multimodal KBs for grounding and motivates robust retrieval verification and cross-modal consistency checks to safeguard multimodal RAG systems.

Abstract

Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.

Paper Structure

This paper contains 44 sections, 5 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Knowledge Poisoning Attacks on Multimodal RAG Framework.MM-PoisonRAG injects adversarial multimodal content into external knowledge bases, cascading it from retrieval to generation. We introduce two attack strategies: (1) Localized Poisoning Attack implants a targeted query-specific misinformation, guiding MLLMs into producing attacker-defined answers (e.g., White), and (2) Globalized Poisoning Attack inserts a single untargeted adversarial entry that broadly corrupts generation, driving irrelevant answers (e.g., Sorry) for all queries.
  • Figure 2: Visualization of joint embedding. T-SNE projection into 3D space shows that image and text embeddings form separate clusters.
  • Figure 3: Globalized poisoning attack results on MMQA and WebQA. Rt denotes GPA-Rt, and RtRrGen means GPA-RtRrGen. Rt. and Rr. stand for retriever and reranker, respectively. Capt. stands for caption. The values in red show drops in retrieval recall and accuracy compared to those before poisoning attacks.
  • Figure 3: Recall and accuracy for original and poisoned context after applying an attack of GPA-RtRrGen.
  • Figure 4: Similarity scores of the ground-truth (GT) and poisoned image embedding with the query embedding.
  • ...and 9 more figures