Table of Contents
Fetching ...

Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

Yinuo Liu, Zenghui Yuan, Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

TL;DR

This work introduces Poisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems, and formalizes the attack as an optimization problem and proposes two cross-modal attack strategies, dirty-label and clean-label, tailored to the attacker's knowledge and goals.

Abstract

Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) by dynamically accessing information from external knowledge bases. In this work, we introduce \textit{Poisoned-MRAG}, the first knowledge poisoning attack on multimodal RAG systems. Poisoned-MRAG injects a few carefully crafted image-text pairs into the multimodal knowledge database, manipulating VLMs to generate the attacker-desired response to a target query. Specifically, we formalize the attack as an optimization problem and propose two cross-modal attack strategies, dirty-label and clean-label, tailored to the attacker's knowledge and goals. Our extensive experiments across multiple knowledge databases and VLMs show that Poisoned-MRAG outperforms existing methods, achieving up to 98\% attack success rate with just five malicious image-text pairs injected into the InfoSeek database (481,782 pairs). Additionally, We evaluate 4 different defense strategies, including paraphrasing, duplicate removal, structure-driven mitigation, and purification, demonstrating their limited effectiveness and trade-offs against Poisoned-MRAG. Our results highlight the effectiveness and scalability of Poisoned-MRAG, underscoring its potential as a significant threat to multimodal RAG systems.

Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

TL;DR

This work introduces Poisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems, and formalizes the attack as an optimization problem and proposes two cross-modal attack strategies, dirty-label and clean-label, tailored to the attacker's knowledge and goals.

Abstract

Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) by dynamically accessing information from external knowledge bases. In this work, we introduce \textit{Poisoned-MRAG}, the first knowledge poisoning attack on multimodal RAG systems. Poisoned-MRAG injects a few carefully crafted image-text pairs into the multimodal knowledge database, manipulating VLMs to generate the attacker-desired response to a target query. Specifically, we formalize the attack as an optimization problem and propose two cross-modal attack strategies, dirty-label and clean-label, tailored to the attacker's knowledge and goals. Our extensive experiments across multiple knowledge databases and VLMs show that Poisoned-MRAG outperforms existing methods, achieving up to 98\% attack success rate with just five malicious image-text pairs injected into the InfoSeek database (481,782 pairs). Additionally, We evaluate 4 different defense strategies, including paraphrasing, duplicate removal, structure-driven mitigation, and purification, demonstrating their limited effectiveness and trade-offs against Poisoned-MRAG. Our results highlight the effectiveness and scalability of Poisoned-MRAG, underscoring its potential as a significant threat to multimodal RAG systems.

Paper Structure

This paper contains 43 sections, 11 equations, 16 figures, 8 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of Poisoned-MRAG. In the attacking stage, the attacker creates malicious image-text pairs, which are then collected by multimodal RAG into the knowledge database alongside the benign pairs. In the inference stage, the malicious pairs are ranked higher than the benign ones, influencing the VLM to generate responses aligned with the attacker's desired outcome.
  • Figure 2: Crafting the image to maximize image-text pair similarity in our clean-label attack.
  • Figure 3: Impact of the number of retrieved candidates $k$ and injected malicious pairs $N$, evaluated on InfoSeek.
  • Figure 4: Impact of different loss terms (image-image and pair-pair) in clean-label attack, evaluated on InfoSeek.
  • Figure 5: Impact of iteration number in clean-label attack, evaluated on InfoSeek.
  • ...and 11 more figures