Table of Contents
Fetching ...

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu, Feiyang Li, Qiao Yuan, Jiang Wu, Zichuan Liu, Pengcheng Liu, Mei Wang, Hongwei Zhou, Yuling Liu

Abstract

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Abstract

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.
Paper Structure (45 sections, 11 figures, 6 tables, 3 algorithms)

This paper contains 45 sections, 11 figures, 6 tables, 3 algorithms.

Figures (11)

  • Figure 1: Overview of RAG security research trends and literature composition
  • Figure 2: Structured taxonomy mapping the landscape of RAG security research. The diagram organizes the field into Threats, Defenses, and Evaluation Standards, cascading down to specific methodologies and key literature citations. To provide architectural context, the right-hand column groups these methodologies based on where they operate within the RAG pipeline, differentiating between attacks targeting specific modules (e.g., Retrievers) and defenses applied at specific stages (e.g., Full-Pipeline Defenses).
  • Figure 3: RAG technical workflow: i) Vector database construction, which involves calculating semantic vectors for data chunks via data chunking and embedding models to establish the vector database ; ii) Retriever, responsible for retrieving the top-k data chunks most relevant to the user query from the database ; iii) Generator, responsible for integrating the top-k data chunks with the user query and submitting them to the large language model for response generation .
  • Figure 4: Security threats to RAG systems: i) Data poisoning attacks, where attackers inject malicious data into the database to manipulate output results ; ii) Indirect attacks, where attackers utilize external data as a carrier to inject payloads targeting the large language model, such as prompt injection or jailbreaking, to compromise the model ; iii) Embedding inversion attacks, which are methods that reconstruct original data from embedding vectors ; iv) Adversarial attacks, which target the retrieval logic by injecting imperceptible perturbations into the data to disrupt model responses ; v) Membership inference attacks, which infer the presence of sensitive data within the database based on features such as confidence scores in RAG responses .
  • Figure 5: Mechanisms of data injection and propagation in RAG poisoning attacks. This figure details how an attacker successfully manipulates the final output of an LLM by compromising the RAG system's knowledge base. The architecture is divided into data construction, retrieval, and generation stages. The critical vulnerability occurs during data construction (purple block), where malicious data is covertly injected and processed into embedding vectors. Consequently, the standard retrieval mechanism (green block) is weaponized; it actively fetches the poisoned chunk as context for the user's query. As indicated by the red flow path, the generation component (orange block) blindly trusts this manipulated context, resulting in the successful execution of the attack and the delivery of fabricated information to the user.
  • ...and 6 more figures