Table of Contents
Fetching ...

Mask-based Membership Inference Attacks for Retrieval-Augmented Generation

Mingrui Liu, Sixiao Zhang, Cheng Long

TL;DR

This work addresses privacy risks in Retrieval-Augmented Generation by introducing Mask-Based Membership Inference Attacks (MBA) that detect whether a target document is stored in a RAG knowledge base. MBA uses a proxy language model to generate $M$ challenging masks from the target document, then prompts the RAG system to predict the masked terms; membership is inferred via a threshold on the number of correct mask predictions, $\gamma \cdot M$, using a Binary Membership Inference Classifier. The authors design a robust mask-generation pipeline (fragmented words, misspellings, and proxy-LM masking) and validate MBA on three public QA datasets, showing ROC AUC gains over 20% over strong baselines and robustness to paraphrasing, defenses, and different retrievers/LLMs. The findings highlight significant privacy risks in RAG systems and provide a practical, non-parametric attack that remains effective across diverse configurations, informing both risk assessment and potential countermeasures.

Abstract

Retrieval-Augmented Generation (RAG) has been an effective approach to mitigate hallucinations in large language models (LLMs) by incorporating up-to-date and domain-specific knowledge. Recently, there has been a trend of storing up-to-date or copyrighted data in RAG knowledge databases instead of using it for LLM training. This practice has raised concerns about Membership Inference Attacks (MIAs), which aim to detect if a specific target document is stored in the RAG system's knowledge database so as to protect the rights of data producers. While research has focused on enhancing the trustworthiness of RAG systems, existing MIAs for RAG systems remain largely insufficient. Previous work either relies solely on the RAG system's judgment or is easily influenced by other documents or the LLM's internal knowledge, which is unreliable and lacks explainability. To address these limitations, we propose a Mask-Based Membership Inference Attacks (MBA) framework. Our framework first employs a masking algorithm that effectively masks a certain number of words in the target document. The masked text is then used to prompt the RAG system, and the RAG system is required to predict the mask values. If the target document appears in the knowledge database, the masked text will retrieve the complete target document as context, allowing for accurate mask prediction. Finally, we adopt a simple yet effective threshold-based method to infer the membership of target document by analyzing the accuracy of mask prediction. Our mask-based approach is more document-specific, making the RAG system's generation less susceptible to distractions from other documents or the LLM's internal knowledge. Extensive experiments demonstrate the effectiveness of our approach compared to existing baseline models.

Mask-based Membership Inference Attacks for Retrieval-Augmented Generation

TL;DR

This work addresses privacy risks in Retrieval-Augmented Generation by introducing Mask-Based Membership Inference Attacks (MBA) that detect whether a target document is stored in a RAG knowledge base. MBA uses a proxy language model to generate challenging masks from the target document, then prompts the RAG system to predict the masked terms; membership is inferred via a threshold on the number of correct mask predictions, , using a Binary Membership Inference Classifier. The authors design a robust mask-generation pipeline (fragmented words, misspellings, and proxy-LM masking) and validate MBA on three public QA datasets, showing ROC AUC gains over 20% over strong baselines and robustness to paraphrasing, defenses, and different retrievers/LLMs. The findings highlight significant privacy risks in RAG systems and provide a practical, non-parametric attack that remains effective across diverse configurations, informing both risk assessment and potential countermeasures.

Abstract

Retrieval-Augmented Generation (RAG) has been an effective approach to mitigate hallucinations in large language models (LLMs) by incorporating up-to-date and domain-specific knowledge. Recently, there has been a trend of storing up-to-date or copyrighted data in RAG knowledge databases instead of using it for LLM training. This practice has raised concerns about Membership Inference Attacks (MIAs), which aim to detect if a specific target document is stored in the RAG system's knowledge database so as to protect the rights of data producers. While research has focused on enhancing the trustworthiness of RAG systems, existing MIAs for RAG systems remain largely insufficient. Previous work either relies solely on the RAG system's judgment or is easily influenced by other documents or the LLM's internal knowledge, which is unreliable and lacks explainability. To address these limitations, we propose a Mask-Based Membership Inference Attacks (MBA) framework. Our framework first employs a masking algorithm that effectively masks a certain number of words in the target document. The masked text is then used to prompt the RAG system, and the RAG system is required to predict the mask values. If the target document appears in the knowledge database, the masked text will retrieve the complete target document as context, allowing for accurate mask prediction. Finally, we adopt a simple yet effective threshold-based method to infer the membership of target document by analyzing the accuracy of mask prediction. Our mask-based approach is more document-specific, making the RAG system's generation less susceptible to distractions from other documents or the LLM's internal knowledge. Extensive experiments demonstrate the effectiveness of our approach compared to existing baseline models.

Paper Structure

This paper contains 42 sections, 3 equations, 10 figures, 3 tables, 4 algorithms.

Figures (10)

  • Figure 1: Distributions of Indicators for Member and Non-Member Samples in Different Methods on HealthCareMagic-100k dataset, which are visualised by kernel density estimate (KDE) method.
  • Figure 2: The overview of our proposed MBA framework.
  • Figure 3: The performances comparison varying $M$
  • Figure 4: The performances comparison varying $\gamma$
  • Figure 5: The prompt template to predict the masked words.
  • ...and 5 more figures