Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?
Yujin Choi, Youngjoo Park, Junyoung Byun, Jaewook Lee, Jinseong Park
TL;DR
This work targets privacy risks from membership inference attacks in retrieval-augmented generation by introducing Mirabel, a Gumbel-distribution-based detector that flags member-attacks using the maximum query-document similarity and a derived threshold. A simple detect-and-hide defense conceals the targeted document when a member-attack is detected, while preserving utility for benign queries and remaining agnostic to the specific RAG implementation. Empirical evaluation across multiple MIAs, datasets, and models demonstrates strong detection performance and defense efficacy, achieving indistinguishability comparable to or better than DP-based baselines with minimal utility loss. The approach is model-agnostic, computationally light, and can be combined with existing privacy-preserving strategies to strengthen RAG privacy in practical deployments.
Abstract
Retrieval-augmented generation (RAG) mitigates the hallucination problem in large language models (LLMs) and has proven effective for personalized usages. However, delivering private retrieved documents directly to LLMs introduces vulnerability to membership inference attacks (MIAs), which try to determine whether the target data point exists in the private external database or not. Based on the insight that MIA queries typically exhibit high similarity to only one target document, we introduce a novel similarity-based MIA detection framework designed for the RAG system. With the proposed method, we show that a simple detect-and-hide strategy can successfully obfuscate attackers, maintain data utility, and remain system-agnostic against MIA. We experimentally prove its detection and defense against various state-of-the-art MIA methods and its adaptability to existing RAG systems.
