Table of Contents
Fetching ...

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation

Maya Anderson, Guy Amit, Abigail Goldsteen

TL;DR

Retrieval Augmented Generation systems pose privacy risks when the retrieval database may contain proprietary or sensitive data. The authors present RAG-MIA, an efficient prompt-based Membership Inference Attack that can determine whether a given document is in the retrieval database, in both black-box and gray-box settings, and evaluate it on Enron and HealthcareMagic across multiple models. They show high attack effectiveness, especially in gray-box, and propose an initial defense by injecting instructions into the RAG template to discourage answering membership questions, with varying effectiveness by model. The work underscores the need for stronger defenses against MIAs in RAG pipelines and motivates future work on adaptive attacks and differential privacy-based remedies.

Abstract

Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined through the creation of an appropriate prompt in both black-box and gray-box settings. Moreover, we introduce an initial defense strategy based on adding instructions to the RAG template, which shows high effectiveness for some datasets and models. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems and developing more advanced defenses to protect the privacy and security of retrieval databases.

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation

TL;DR

Retrieval Augmented Generation systems pose privacy risks when the retrieval database may contain proprietary or sensitive data. The authors present RAG-MIA, an efficient prompt-based Membership Inference Attack that can determine whether a given document is in the retrieval database, in both black-box and gray-box settings, and evaluate it on Enron and HealthcareMagic across multiple models. They show high attack effectiveness, especially in gray-box, and propose an initial defense by injecting instructions into the RAG template to discourage answering membership questions, with varying effectiveness by model. The work underscores the need for stronger defenses against MIAs in RAG pipelines and motivates future work on adaptive attacks and differential privacy-based remedies.

Abstract

Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined through the creation of an appropriate prompt in both black-box and gray-box settings. Moreover, we introduce an initial defense strategy based on adding instructions to the RAG template, which shows high effectiveness for some datasets and models. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems and developing more advanced defenses to protect the privacy and security of retrieval databases.
Paper Structure (20 sections, 1 equation, 4 figures, 11 tables)

This paper contains 20 sections, 1 equation, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Example RAG template for the generation phase of a RAG system. The highlighted placeholders are replaced by the fetched documents from the database and the user prompt, respectively.
  • Figure 2: Overall Flow of our MIA Attack on a RAG pipeline.
  • Figure 3: Attack prompt example for RAG-MIA. The highlighted text is the attack-specific part of the prompt, and the rest is taken from the sample for which membership is inferred.
  • Figure 4: Comparison of different attack prompts. The top row shows results for black-box attacks, evaluated using AUC-ROC and TPR@FPR. The wide bars show the TPR and the narrow bars inside them show the respective FPR values. The bottom row shows results for gray-box attacks, evaluated using AUC-ROC and TPR@lowFPR.