EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling
Zichen Song, Sitan Huang, Zhongfeng Kang
TL;DR
This work addresses privacy risks in large language models by evaluating Membership Inference Attacks (MIAs) and showing that existing methods often underperform on large or single-epoch trained LLMs. It introduces EM-MIAs, an ensemble framework that concatenates signals from LOSS, Reference-based, Min-k%, and zlib attacks into an XGBoost model to boost membership inference accuracy. Across seven diverse datasets and LLM sizes ranging from 160M to 12B parameters, EM-MIAs achieves higher AUC-ROC and accuracy than any individual MIA, demonstrating the value of combining complementary attack signals. The findings highlight stronger privacy risks in LLMs and emphasize the need for advanced privacy auditing and defense mechanisms to mitigate potential data leakage in real-world deployments.
Abstract
With the widespread application of large language models (LLM), concerns about the privacy leakage of model training data have increasingly become a focus. Membership Inference Attacks (MIAs) have emerged as a critical tool for evaluating the privacy risks associated with these models. Although existing attack methods, such as LOSS, Reference-based, min-k, and zlib, perform well in certain scenarios, their effectiveness on large pre-trained language models often approaches random guessing, particularly in the context of large-scale datasets and single-epoch training. To address this issue, this paper proposes a novel ensemble attack method that integrates several existing MIAs techniques (LOSS, Reference-based, min-k, zlib) into an XGBoost-based model to enhance overall attack performance (EM-MIAs). Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets. This indicates that by combining the strengths of different methods, we can more effectively identify members of the model's training data, thereby providing a more robust tool for evaluating the privacy risks of LLM. This study offers new directions for further research in the field of LLM privacy protection and underscores the necessity of developing more powerful privacy auditing methods.
