Table of Contents
Fetching ...

Benchmarking Poisoning Attacks against Retrieval-Augmented Generation

Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, Zheli Liu

TL;DR

This work introduces RAG Security Bench (RSB), a comprehensive benchmark to assess poisoning attacks and defenses across retrieval-augmented generation. It spans 15 QA datasets (with expansions), 13 attacks, 7 defenses, and 6 advanced RAG paradigms, including multi-turn, multimodal, and LLM-agent systems. Key findings show that while many attacks are potent on standard data, their effectiveness diminishes with richer knowledge; nonetheless, several targeted attacks (notably CRAG-AK) sustain impact in expansion settings, and current defenses are generally insufficient or impose large accuracy costs. The study underscores the persistent security gaps in RAG systems and calls for more robust, generalizable defenses that explicitly consider retrieval dynamics and adversarial context.

Abstract

Retrieval-Augmented Generation (RAG) has proven effective in mitigating hallucinations in large language models by incorporating external knowledge during inference. However, this integration introduces new security vulnerabilities, particularly to poisoning attacks. Although prior work has explored various poisoning strategies, a thorough assessment of their practical threat to RAG systems remains missing. To address this gap, we propose the first comprehensive benchmark framework for evaluating poisoning attacks on RAG. Our benchmark covers 5 standard question answering (QA) datasets and 10 expanded variants, along with 13 poisoning attack methods and 7 defense mechanisms, representing a broad spectrum of existing techniques. Using this benchmark, we conduct a comprehensive evaluation of all included attacks and defenses across the full dataset spectrum. Our findings show that while existing attacks perform well on standard QA datasets, their effectiveness drops significantly on the expanded versions. Moreover, our results demonstrate that various advanced RAG architectures, such as sequential, branching, conditional, and loop RAG, as well as multi-turn conversational RAG, multimodal RAG systems, and RAG-based LLM agent systems, remain susceptible to poisoning attacks. Notably, current defense techniques fail to provide robust protection, underscoring the pressing need for more resilient and generalizable defense strategies.

Benchmarking Poisoning Attacks against Retrieval-Augmented Generation

TL;DR

This work introduces RAG Security Bench (RSB), a comprehensive benchmark to assess poisoning attacks and defenses across retrieval-augmented generation. It spans 15 QA datasets (with expansions), 13 attacks, 7 defenses, and 6 advanced RAG paradigms, including multi-turn, multimodal, and LLM-agent systems. Key findings show that while many attacks are potent on standard data, their effectiveness diminishes with richer knowledge; nonetheless, several targeted attacks (notably CRAG-AK) sustain impact in expansion settings, and current defenses are generally insufficient or impose large accuracy costs. The study underscores the persistent security gaps in RAG systems and calls for more robust, generalizable defenses that explicitly consider retrieval dynamics and adversarial context.

Abstract

Retrieval-Augmented Generation (RAG) has proven effective in mitigating hallucinations in large language models by incorporating external knowledge during inference. However, this integration introduces new security vulnerabilities, particularly to poisoning attacks. Although prior work has explored various poisoning strategies, a thorough assessment of their practical threat to RAG systems remains missing. To address this gap, we propose the first comprehensive benchmark framework for evaluating poisoning attacks on RAG. Our benchmark covers 5 standard question answering (QA) datasets and 10 expanded variants, along with 13 poisoning attack methods and 7 defense mechanisms, representing a broad spectrum of existing techniques. Using this benchmark, we conduct a comprehensive evaluation of all included attacks and defenses across the full dataset spectrum. Our findings show that while existing attacks perform well on standard QA datasets, their effectiveness drops significantly on the expanded versions. Moreover, our results demonstrate that various advanced RAG architectures, such as sequential, branching, conditional, and loop RAG, as well as multi-turn conversational RAG, multimodal RAG systems, and RAG-based LLM agent systems, remain susceptible to poisoning attacks. Notably, current defense techniques fail to provide robust protection, underscoring the pressing need for more resilient and generalizable defense strategies.

Paper Structure

This paper contains 46 sections, 3 equations, 6 figures, 45 tables.

Figures (6)

  • Figure 1: Illustration of the standard workflow in a RAG system.
  • Figure 2: Results of poisoning attacks under different LLMs of RAG on NQ dataset. LLM versions in Appendix \ref{['appendix_sec:details_of_llms']}.
  • Figure 3: The results of poisoning attacks under different top-$K$ of RAG on NQ dataset.
  • Figure 4: The number of correct-answer texts among top-5 for each targeted query on NQ, NQ-EX-M, and NQ-EX-L datasets.
  • Figure 5: The results of poisoning attacks under different LLMs of RAG on NQ-EX-M and NQ-EX-L datasets.
  • ...and 1 more figures