Table of Contents
Fetching ...

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models

Yucheng Zhang, Qinfeng Li, Tianyu Du, Xuhong Zhang, Xinkui Zhao, Zhengwen Feng, Jianwei Yin

TL;DR

A novel vulnerability, the retrieval prompt hijack attack (HijackRAG), is revealed, which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database.

Abstract

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge, making them adaptable and cost-effective for various applications. However, the growing reliance on these systems also introduces potential security risks. In this work, we reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG), which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. When the RAG system encounters target questions, it generates the attacker's pre-determined answers instead of the correct ones, undermining the integrity and trustworthiness of the system. We formalize HijackRAG as an optimization problem and propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge. Extensive experiments on multiple benchmark datasets show that HijackRAG consistently achieves high attack success rates, outperforming existing baseline attacks. Furthermore, we demonstrate that the attack is transferable across different retriever models, underscoring the widespread risk it poses to RAG systems. Lastly, our exploration of various defense mechanisms reveals that they are insufficient to counter HijackRAG, emphasizing the urgent need for more robust security measures to protect RAG systems in real-world deployments.

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models

TL;DR

A novel vulnerability, the retrieval prompt hijack attack (HijackRAG), is revealed, which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database.

Abstract

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge, making them adaptable and cost-effective for various applications. However, the growing reliance on these systems also introduces potential security risks. In this work, we reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG), which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. When the RAG system encounters target questions, it generates the attacker's pre-determined answers instead of the correct ones, undermining the integrity and trustworthiness of the system. We formalize HijackRAG as an optimization problem and propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge. Extensive experiments on multiple benchmark datasets show that HijackRAG consistently achieves high attack success rates, outperforming existing baseline attacks. Furthermore, we demonstrate that the attack is transferable across different retriever models, underscoring the widespread risk it poses to RAG systems. Lastly, our exploration of various defense mechanisms reveals that they are insufficient to counter HijackRAG, emphasizing the urgent need for more robust security measures to protect RAG systems in real-world deployments.

Paper Structure

This paper contains 26 sections, 2 equations, 3 figures, 6 tables, 2 algorithms.

Figures (3)

  • Figure 1: Illustration of hijacking risks in RAG systems: (a) naive prompt injection attack fails in RAG systems (i.e., the output is still the original answer), while (b) HijackRAG successfully manipulates the output, i.e., the output is the attacker's targeted answer.
  • Figure 2: Overview of HijackRAG. Given a target query, HijackRAG generates and injects a malicious text into the database. The RAG system retrieves this text for the target query, with the original target being hijacked to generate the attacker's target response.
  • Figure 3: Impact of top-$k$ on HijackRAG performance.