Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr
TL;DR
This paper systematically compares privacy vulnerabilities of RAG- and SFT-based knowledge-injected LLMs under membership inference attacks and introduces Ensemble Privacy Defense (EPD), a training-free, model-agnostic inference-time framework that aggregates candidate outputs from a target model and a base model under the judgment of a dedicated LLM. EPD substantially reduces MIA success across multiple datasets and attack types while preserving answer quality, with larger judge models and judicious retrieval settings enhancing protection. The work demonstrates that RAG typically offers stronger privacy resilience than SFT, and that a judiciously designed judge-driven ensemble provides a practical path to privacy-preserving deployment in knowledge-intensive LLM applications. Ablation studies show the judge’s capacity and the ensemble’s configuration are key drivers, while adaptive noise injection yields context-dependent gains, indicating that the core privacy benefit comes from the judge-based aggregation rather than auxiliary noise. Overall, EPD offers a scalable, deployment-friendly approach to mitigating membership leakage in real-world, knowledge-intensive LLM systems.
Abstract
Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs) with external knowledge for diverse, knowledge-intensive tasks. However, while such knowledge injection improves performance, it also exposes new attack surfaces. Membership Inference Attacks (MIAs), which aim to determine whether a given data sample was included in a model's training set, pose serious threats to privacy and trust in sensitive domains. To this end, we first systematically evaluate the vulnerability of RAG- and SFT-based LLMs to various MIAs. Then, to address the privacy risk, we further introduce a novel, model-agnostic defense framework, Ensemble Privacy Defense (EPD), which aggregates and evaluates the outputs of a knowledge-injected LLM, a base LLM, and a dedicated judge model to enhance resistance against MIAs. Comprehensive experiments show that, on average, EPD reduces MIA success by up to 27.8\% for SFT and 526.3\% for RAG compared to inference-time baseline, while maintaining answer quality.
