Table of Contents
Fetching ...

Membership Inference Attacks on Machine Learning: A Survey

Hongsheng Hu, Zoran Salcic, Lichao Sun, Gillian Dobbie, Philip S. Yu, Xuyun Zhang

TL;DR

<3-5 sentence high-level summary>This survey provides the first comprehensive overview of membership inference attacks (MIAs) and defenses for machine learning models. It introduces taxonomies for attacks and defenses, surveys white-box and black-box settings, and covers a broad range of model types including classification, generation, embedding, and regression, as well as federated learning scenarios. The authors summarize metrics, datasets, and open-source implementations, and discuss factors that drive MIA success—most notably overfitting and data distribution characteristics—while outlining future directions such as self-supervised learning, FL heterogeneity, and defense-utility trade-offs. An online repository accompanies the work to support benchmarking and ongoing updates in this rapidly evolving field.

Abstract

Machine learning (ML) models have been widely applied to various applications, including image classification, text generation, audio recognition, and graph data analysis. However, recent studies have shown that ML models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. MIAs on ML models can directly lead to a privacy breach. For example, via identifying the fact that a clinical record that has been used to train a model associated with a certain disease, an attacker can infer that the owner of the clinical record has the disease with a high chance. In recent years, MIAs have been shown to be effective on various ML models, e.g., classification models and generative models. Meanwhile, many defense methods have been proposed to mitigate MIAs. Although MIAs on ML models form a newly emerging and rapidly growing research area, there has been no systematic survey on this topic yet. In this paper, we conduct the first comprehensive survey on membership inference attacks and defenses. We provide the taxonomies for both attacks and defenses, based on their characterizations, and discuss their pros and cons. Based on the limitations and gaps identified in this survey, we point out several promising future research directions to inspire the researchers who wish to follow this area. This survey not only serves as a reference for the research community but also provides a clear description for researchers outside this research domain. To further help the researchers, we have created an online resource repository, which we will keep updated with future relevant work. Interested readers can find the repository at https://github.com/HongshengHu/membership-inference-machine-learning-literature.

Membership Inference Attacks on Machine Learning: A Survey

TL;DR

<3-5 sentence high-level summary>This survey provides the first comprehensive overview of membership inference attacks (MIAs) and defenses for machine learning models. It introduces taxonomies for attacks and defenses, surveys white-box and black-box settings, and covers a broad range of model types including classification, generation, embedding, and regression, as well as federated learning scenarios. The authors summarize metrics, datasets, and open-source implementations, and discuss factors that drive MIA success—most notably overfitting and data distribution characteristics—while outlining future directions such as self-supervised learning, FL heterogeneity, and defense-utility trade-offs. An online repository accompanies the work to support benchmarking and ongoing updates in this rapidly evolving field.

Abstract

Machine learning (ML) models have been widely applied to various applications, including image classification, text generation, audio recognition, and graph data analysis. However, recent studies have shown that ML models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. MIAs on ML models can directly lead to a privacy breach. For example, via identifying the fact that a clinical record that has been used to train a model associated with a certain disease, an attacker can infer that the owner of the clinical record has the disease with a high chance. In recent years, MIAs have been shown to be effective on various ML models, e.g., classification models and generative models. Meanwhile, many defense methods have been proposed to mitigate MIAs. Although MIAs on ML models form a newly emerging and rapidly growing research area, there has been no systematic survey on this topic yet. In this paper, we conduct the first comprehensive survey on membership inference attacks and defenses. We provide the taxonomies for both attacks and defenses, based on their characterizations, and discuss their pros and cons. Based on the limitations and gaps identified in this survey, we point out several promising future research directions to inspire the researchers who wish to follow this area. This survey not only serves as a reference for the research community but also provides a clear description for researchers outside this research domain. To further help the researchers, we have created an online resource repository, which we will keep updated with future relevant work. Interested readers can find the repository at https://github.com/HongshengHu/membership-inference-machine-learning-literature.

Paper Structure

This paper contains 27 sections, 1 theorem, 23 equations, 9 figures, 4 tables.

Key Result

theorem 1

bentley2020quantifying Given access to a model with generalization gap $g = p_0 - p_1 \ge 0$ (training accuracy minus testing accuracy) and the ratio of training dataset to input domain $q$, there exists a membership inference attack with expected attack success rate (ASR) at least:

Figures (9)

  • Figure 1: A typical deep learning process for classification models.
  • Figure 2: Overview of white-box membership inference attacks.
  • Figure 3: Overview of black-box membership inference attacks.
  • Figure 4: Overview of the shadow training technique.
  • Figure 5: Overview of binary classifier-based attack models in black-box and white-box settings. In the membership inference phase, the black-box attack model only takes the prediction vector $\hat{p}(y \mid \bm{x})$ as input and outputs the membership status of the data record. However, the white-box attack model can take the flat vector $\bm{\nu}$ containing much more information of the data record as input and outputs its membership status.
  • ...and 4 more figures

Theorems & Definitions (1)

  • theorem 1