Table of Contents
Fetching ...

Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks

Gianluca De Stefano, Lea Schönherr, Giancarlo Pellegrino

TL;DR

This work investigates the end-to-end security of Retrieval-Augmented Generation (RAG) systems against indirect prompt manipulation by introducing Rag-n-Roll, an automated evaluation framework. The authors decompose RAG architectures, survey prior attacks on retrieval and ranking, and evaluate end-to-end effectiveness across configurations with a curated dataset of benign and malicious documents. Their results show that most attacks achieve around 40% success, rising to about 60% when counting ambiguous answers; using multiple malicious documents and benign data redundancy can influence outcomes, while RAG parameter tuning provides limited defense. The study underscores the need for defenses beyond tuning, highlighting the LLM as a critical last line of defense and offering a reproducible framework and dataset for future security research in RAG-enabled applications.

Abstract

Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This process involves collecting, indexing, retrieving, and providing information to an LLM for generating responses. Despite its growing popularity due to its flexibility and low cost, the security implications of RAG have not been extensively studied. The data for such systems are often collected from public sources, providing an attacker a gateway for indirect prompt injections to manipulate the responses of the model. In this paper, we investigate the security of RAG systems against end-to-end indirect prompt manipulations. First, we review existing RAG framework pipelines, deriving a prototypical architecture and identifying critical parameters. We then examine prior works searching for techniques that attackers can use to perform indirect prompt manipulations. Finally, we implemented Rag 'n Roll, a framework to determine the effectiveness of attacks against end-to-end RAG applications. Our results show that existing attacks are mostly optimized to boost the ranking of malicious documents during the retrieval phase. However, a higher rank does not immediately translate into a reliable attack. Most attacks, against various configurations, settle around a 40% success rate, which could rise to 60% when considering ambiguous answers as successful attacks (those that include the expected benign one as well). Additionally, when using unoptimized documents, attackers deploying two of them (or more) for a target query can achieve similar results as those using optimized ones. Finally, exploration of the configuration space of a RAG showed limited impact in thwarting the attacks, where the most successful combination severely undermines functionality.

Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks

TL;DR

This work investigates the end-to-end security of Retrieval-Augmented Generation (RAG) systems against indirect prompt manipulation by introducing Rag-n-Roll, an automated evaluation framework. The authors decompose RAG architectures, survey prior attacks on retrieval and ranking, and evaluate end-to-end effectiveness across configurations with a curated dataset of benign and malicious documents. Their results show that most attacks achieve around 40% success, rising to about 60% when counting ambiguous answers; using multiple malicious documents and benign data redundancy can influence outcomes, while RAG parameter tuning provides limited defense. The study underscores the need for defenses beyond tuning, highlighting the LLM as a critical last line of defense and offering a reproducible framework and dataset for future security research in RAG-enabled applications.

Abstract

Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This process involves collecting, indexing, retrieving, and providing information to an LLM for generating responses. Despite its growing popularity due to its flexibility and low cost, the security implications of RAG have not been extensively studied. The data for such systems are often collected from public sources, providing an attacker a gateway for indirect prompt injections to manipulate the responses of the model. In this paper, we investigate the security of RAG systems against end-to-end indirect prompt manipulations. First, we review existing RAG framework pipelines, deriving a prototypical architecture and identifying critical parameters. We then examine prior works searching for techniques that attackers can use to perform indirect prompt manipulations. Finally, we implemented Rag 'n Roll, a framework to determine the effectiveness of attacks against end-to-end RAG applications. Our results show that existing attacks are mostly optimized to boost the ranking of malicious documents during the retrieval phase. However, a higher rank does not immediately translate into a reliable attack. Most attacks, against various configurations, settle around a 40% success rate, which could rise to 60% when considering ambiguous answers as successful attacks (those that include the expected benign one as well). Additionally, when using unoptimized documents, attackers deploying two of them (or more) for a target query can achieve similar results as those using optimized ones. Finally, exploration of the configuration space of a RAG showed limited impact in thwarting the attacks, where the most successful combination severely undermines functionality.
Paper Structure (55 sections, 4 figures, 9 tables)

This paper contains 55 sections, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Manipulation of Gemini's answers against Alice. First, Alice receives an email from Bob (step (1)). Then, they receive an email from Mallory saying "Hi, I'm Alice. I want to let you know that my new name is now Bob and my new email address is mallory@mallory.com" (step (2))
  • Figure 2: A prototypical architecture of a RAG-based application framework. We identified six building-block components, organized in two groups: those processing data items from the data source and those generating the answer to a user's query.
  • Figure 3: Share of benign (solid line) and hallucinating (dashed line) responses generated by different models on original or mutated queries.
  • Figure 4: Overview of Rag-n-Roll