Table of Contents
Fetching ...

Practical Poisoning Attacks against Retrieval-Augmented Generation

Baolei Zhang, Yuxi Chen, Zhuqing Liu, Lihai Nie, Tong Li, Zheli Liu, Minghong Fang

TL;DR

This work addresses the vulnerability of retrieval-augmented generation (RAG) systems to poisoning by introducing CorruptRAG, a practical single-shot poisoning framework that injects at most one poisoned text per targeted query. It formulates poisoning as a Hit Ratio Maximization ($\text{HRM}$) optimization under strict constraints and proposes two variants, CorruptRAG-AS and CorruptRAG-AK, to craft adversarial knowledge that steers the LLM toward attacker-chosen outputs. Extensive experiments on three BeIR-derived datasets show that CorruptRAG consistently outperforms baselines in attack success rate, while remaining cost-effective and resistant to several defenses, highlighting real-world security risks for RAG deployments. The findings motivate the development of stronger defenses (beyond paraphrasing or knowledge expansion) and point to directions for future work in untargeted attacks and multi-modal RAG security.

Abstract

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.

Practical Poisoning Attacks against Retrieval-Augmented Generation

TL;DR

This work addresses the vulnerability of retrieval-augmented generation (RAG) systems to poisoning by introducing CorruptRAG, a practical single-shot poisoning framework that injects at most one poisoned text per targeted query. It formulates poisoning as a Hit Ratio Maximization () optimization under strict constraints and proposes two variants, CorruptRAG-AS and CorruptRAG-AK, to craft adversarial knowledge that steers the LLM toward attacker-chosen outputs. Extensive experiments on three BeIR-derived datasets show that CorruptRAG consistently outperforms baselines in attack success rate, while remaining cost-effective and resistant to several defenses, highlighting real-world security risks for RAG deployments. The findings motivate the development of stronger defenses (beyond paraphrasing or knowledge expansion) and point to directions for future work in untargeted attacks and multi-modal RAG security.

Abstract

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.

Paper Structure

This paper contains 24 sections, 4 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: The number of truly relevant texts among the top-5 retrieved for each query on Natural Questions dataset.
  • Figure 2: Results of different $N$.
  • Figure 3: Impact of $V$ in CorruptRAG-AK attack.