MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Hyeonjeong Ha; Qiusi Zhan; Jeonghwan Kim; Dimitrios Bralios; Saikrishna Sanniboina; Nanyun Peng; Kai-Wei Chang; Daniel Kang; Heng Ji

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng Ji

TL;DR

MM-PoisonRAG presents the first systematic study of knowledge poisoning in multimodal Retrieval-Augmented Generation, introducing Localized Poisoning Attack (LPA) and Globalized Poisoning Attack (GPA) to evaluate targeted and universal threats. Through extensive experiments on MMQA and WebQA with diverse retrievers and MLLMs, LPA achieves attacker-controlled outputs up to roughly 56% success, while GPA can drive end-to-end accuracy to near 0% with a single poisoned entry, exposing severe fragilities in multimodal RAG. Attacks transfer across retrievers and remain potent under black-box conditions, and paraphrase-based defenses fail to mitigate them, underscoring the need for cross-modal, modality-aware defenses. The work highlights practical risks in relying on external multimodal KBs for grounding and motivates robust retrieval verification and cross-modal consistency checks to safeguard multimodal RAG systems.

Abstract

Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

TL;DR

Abstract

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)