Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation

Yuying Li; Gaoyang Liu; Chen Wang; Yang Yang

Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation

Yuying Li, Gaoyang Liu, Chen Wang, Yang Yang

TL;DR

This work addresses membership privacy risks in retrieval-augmented generation (RAG) by proposing S$^2$MIA, a semantic-similarity–based membership inference attack. The method computes $S_{sem}$ as the BLEU-based similarity between the target input $x_t$ and the RAG-generated output, and uses $PPL_{gen}$ to form a membership score; two inference modes are introduced: a threshold-based approach and a model-based classifier on $(S_{sem},PPL_{gen})$. Across five RAG configurations and multiple LLMs/retrievers, S$^2$MIA outperforms five baselines and defeats three defenses, demonstrating strong leakage from the external database. The results highlight concrete privacy risks for RAG external data stores, motivating the design of more robust defenses in domains with sensitive information and heterogeneous data sources.

Abstract

Retrieval-Augmented Generation (RAG) is a state-of-the-art technique that mitigates issues such as hallucinations and knowledge staleness in Large Language Models (LLMs) by retrieving relevant knowledge from an external database to assist in content generation. Existing research has demonstrated potential privacy risks associated with the LLMs of RAG. However, the privacy risks posed by the integration of an external database, which often contains sensitive data such as medical records or personal identities, have remained largely unexplored. In this paper, we aim to bridge this gap by focusing on membership privacy of RAG's external database, with the aim of determining whether a given sample is part of the RAG's database. Our basic idea is that if a sample is in the external database, it will exhibit a high degree of semantic similarity to the text generated by the RAG system. We present S$^2$MIA, a \underline{M}embership \underline{I}nference \underline{A}ttack that utilizes the \underline{S}emantic \underline{S}imilarity between a given sample and the content generated by the RAG system. With our proposed S$^2$MIA, we demonstrate the potential to breach the membership privacy of the RAG database. Extensive experiment results demonstrate that S$^2$MIA can achieve a strong inference performance compared with five existing MIAs, and is able to escape from the protection of three representative defenses.

Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation

TL;DR

This work addresses membership privacy risks in retrieval-augmented generation (RAG) by proposing S

MIA, a semantic-similarity–based membership inference attack. The method computes

as the BLEU-based similarity between the target input

and the RAG-generated output, and uses

to form a membership score; two inference modes are introduced: a threshold-based approach and a model-based classifier on

. Across five RAG configurations and multiple LLMs/retrievers, S

MIA outperforms five baselines and defeats three defenses, demonstrating strong leakage from the external database. The results highlight concrete privacy risks for RAG external data stores, motivating the design of more robust defenses in domains with sensitive information and heterogeneous data sources.

Abstract

MIA, a \underline{M}embership \underline{I}nference \underline{A}ttack that utilizes the \underline{S}emantic \underline{S}imilarity between a given sample and the content generated by the RAG system. With our proposed S

MIA, we demonstrate the potential to breach the membership privacy of the RAG database. Extensive experiment results demonstrate that S

MIA can achieve a strong inference performance compared with five existing MIAs, and is able to escape from the protection of three representative defenses.

Paper Structure (15 sections, 4 equations, 2 figures, 4 tables)

This paper contains 15 sections, 4 equations, 2 figures, 4 tables.

Introduction
Preliminary
Retrieval-Augmented Generation
Threat Model
Method
Membership Score Generation
Membership Inference
Experiment and result
Experiment Settings
Results
Main Results
Impact of Different Similarity Metrics
Impact of Different Retrievers
Defending against S$^2$MIA
Conclusion

Figures (2)

Figure 1: Framework of S$^2$MIA. Given a target sample, we divide it into the query text and the remaining text. We input the query text into a given RAG system, which retrieves the multiple samples most similar to the query test from the external database. These retrieved texts serve as the context for the LLM to generate responses. Consequently, if the target sample is present in the RAG's external database, the generated text should exhibit high semantic similarity and low perplexity to the remaining text.
Figure 2: Illustration of the distribution of member versus non-member samples

Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation

TL;DR

Abstract

Generating Is Believing: Membership Inference Attacks against Retrieval-Augmented Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)