Table of Contents
Fetching ...

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang

TL;DR

Human-imperceptible retrieval poisoning exploits RAG frameworks to make malicious external content mislead LLM outputs while appearing benign. The authors analyze LangChain-based workflows to identify injection points in document parsing, text splitting, and prompting, and propose a gradient-guided mutation technique to embed attack sequences into documents. They demonstrate strong feasibility with an average ASR of 88.33% across three open-source LLMs and $66.67\%$ in a real-world LangChain app, using PDFs, Markdown, and HTML formats. The work underscores urgent needs for provenance-aware retrieval, robust parsing and chunking, and defenses to harden LLM-powered applications against imperceptible retrieval poisoning.

Abstract

Presently, with the assistance of advanced LLM application development frameworks, more and more LLM-powered applications can effortlessly augment the LLMs' knowledge with external content using the retrieval augmented generation (RAG) technique. However, these frameworks' designs do not have sufficient consideration of the risk of external content, thereby allowing attackers to undermine the applications developed with these frameworks. In this paper, we reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Specifically, through the analysis of LLM application frameworks, attackers can craft documents visually indistinguishable from benign ones. Despite the documents providing correct information, once they are used as reference sources for RAG, the application is misled into generating incorrect responses. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33\% success rate, and achieve a 66.67\% success rate in the real-world application, demonstrating the potential impact of retrieval poisoning.

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

TL;DR

Human-imperceptible retrieval poisoning exploits RAG frameworks to make malicious external content mislead LLM outputs while appearing benign. The authors analyze LangChain-based workflows to identify injection points in document parsing, text splitting, and prompting, and propose a gradient-guided mutation technique to embed attack sequences into documents. They demonstrate strong feasibility with an average ASR of 88.33% across three open-source LLMs and in a real-world LangChain app, using PDFs, Markdown, and HTML formats. The work underscores urgent needs for provenance-aware retrieval, robust parsing and chunking, and defenses to harden LLM-powered applications against imperceptible retrieval poisoning.

Abstract

Presently, with the assistance of advanced LLM application development frameworks, more and more LLM-powered applications can effortlessly augment the LLMs' knowledge with external content using the retrieval augmented generation (RAG) technique. However, these frameworks' designs do not have sufficient consideration of the risk of external content, thereby allowing attackers to undermine the applications developed with these frameworks. In this paper, we reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Specifically, through the analysis of LLM application frameworks, attackers can craft documents visually indistinguishable from benign ones. Despite the documents providing correct information, once they are used as reference sources for RAG, the application is misled into generating incorrect responses. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33\% success rate, and achieve a 66.67\% success rate in the real-world application, demonstrating the potential impact of retrieval poisoning.
Paper Structure (9 sections, 1 equation, 3 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 1 equation, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Attack scenario of retrieval poisoning.
  • Figure 2: Workflow of retrieval poisoning.
  • Figure 3: A case of retrieval poisoning on ChatChat.