Table of Contents
Fetching ...

Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?

Yujin Choi, Youngjoo Park, Junyoung Byun, Jaewook Lee, Jinseong Park

TL;DR

This work targets privacy risks from membership inference attacks in retrieval-augmented generation by introducing Mirabel, a Gumbel-distribution-based detector that flags member-attacks using the maximum query-document similarity and a derived threshold. A simple detect-and-hide defense conceals the targeted document when a member-attack is detected, while preserving utility for benign queries and remaining agnostic to the specific RAG implementation. Empirical evaluation across multiple MIAs, datasets, and models demonstrates strong detection performance and defense efficacy, achieving indistinguishability comparable to or better than DP-based baselines with minimal utility loss. The approach is model-agnostic, computationally light, and can be combined with existing privacy-preserving strategies to strengthen RAG privacy in practical deployments.

Abstract

Retrieval-augmented generation (RAG) mitigates the hallucination problem in large language models (LLMs) and has proven effective for personalized usages. However, delivering private retrieved documents directly to LLMs introduces vulnerability to membership inference attacks (MIAs), which try to determine whether the target data point exists in the private external database or not. Based on the insight that MIA queries typically exhibit high similarity to only one target document, we introduce a novel similarity-based MIA detection framework designed for the RAG system. With the proposed method, we show that a simple detect-and-hide strategy can successfully obfuscate attackers, maintain data utility, and remain system-agnostic against MIA. We experimentally prove its detection and defense against various state-of-the-art MIA methods and its adaptability to existing RAG systems.

Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?

TL;DR

This work targets privacy risks from membership inference attacks in retrieval-augmented generation by introducing Mirabel, a Gumbel-distribution-based detector that flags member-attacks using the maximum query-document similarity and a derived threshold. A simple detect-and-hide defense conceals the targeted document when a member-attack is detected, while preserving utility for benign queries and remaining agnostic to the specific RAG implementation. Empirical evaluation across multiple MIAs, datasets, and models demonstrates strong detection performance and defense efficacy, achieving indistinguishability comparable to or better than DP-based baselines with minimal utility loss. The approach is model-agnostic, computationally light, and can be combined with existing privacy-preserving strategies to strengthen RAG privacy in practical deployments.

Abstract

Retrieval-augmented generation (RAG) mitigates the hallucination problem in large language models (LLMs) and has proven effective for personalized usages. However, delivering private retrieved documents directly to LLMs introduces vulnerability to membership inference attacks (MIAs), which try to determine whether the target data point exists in the private external database or not. Based on the insight that MIA queries typically exhibit high similarity to only one target document, we introduce a novel similarity-based MIA detection framework designed for the RAG system. With the proposed method, we show that a simple detect-and-hide strategy can successfully obfuscate attackers, maintain data utility, and remain system-agnostic against MIA. We experimentally prove its detection and defense against various state-of-the-art MIA methods and its adaptability to existing RAG systems.

Paper Structure

This paper contains 43 sections, 11 equations, 9 figures, 13 tables, 1 algorithm.

Figures (9)

  • Figure 1: Distributions of similarity scores between queries and retrieved data. We visualize both the full similarity distributions and the top-1 similarities. A Gumbel-based threshold $\bigcup_q S_q$ is marked for reference.
  • Figure 2: Illustration of our proposed Mirabel. We perform our detection to classify whether an input query is a member attack query $q_a^m$. If it detected as $q_a^m$, we hide it from data retrieval and proceed standard RAG system.
  • Figure 3: Histograms of IA scores for member and non-member queries across different datasets. After defense, the member and non-member score distributions become indistinguishable.
  • Figure 4: Prompt template given to the RAG generator. It conditions the model on the retrieved contexts and enforces grounded, concise answers with an explicit "I don't know" fallback.
  • Figure 5: Prompt template for the GPT-4o agent that classifies incoming queries as either Natural (task-aligned) or Context-Probing (potentially MIA).
  • ...and 4 more figures