Table of Contents
Fetching ...

Membership Inference Attacks Beyond Overfitting

Mona Khalil, Alberto Blanco-Justicia, Najeeb Jebreel, Josep Domingo-Ferrer

TL;DR

This work tackles the privacy risk of membership inference attacks (MIAs) beyond the classical overfitting narrative, showing that even well-generalizing models can leak information about a subset of training samples. It combines visual (t-SNE) and explanation (Grad-CAM) techniques to identify that vulnerable points are often boundary/outlier samples whose predictions rely on non-relevant features, and it introduces a logit-reweighting defense to protect these samples. Through experiments on Purchase100 and CIFAR-10 with several attacks and defenses, the study characterizes the utility–privacy trade-offs, finding that differential privacy offers strongest protection but at substantial performance and time costs, while regularization/dropout provide favorable compromises. The results offer practical guidance for privacy-preserving ML by highlighting pre- and post-identification strategies to shield vulnerable samples without universally sacrificing model utility.

Abstract

Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant privacy risks to individuals whose sensitive data were used for training, which motivates the use of defenses such as differential privacy, often at the cost of high accuracy losses. MIAs exploit the differences in the behavior of a model when making predictions on samples it has seen during training (members) versus those it has not seen (non-members). Several studies have pointed out that model overfitting is the major factor contributing to these differences in behavior and, consequently, to the success of MIAs. However, the literature also shows that even non-overfitted ML models can leak information about a small subset of their training data. In this paper, we investigate the root causes of membership inference vulnerabilities beyond traditional overfitting concerns and suggest targeted defenses. We empirically analyze the characteristics of the training data samples vulnerable to MIAs in models that are not overfitted (and hence able to generalize). Our findings reveal that these samples are often outliers within their classes (e.g., noisy or hard to classify). We then propose potential defensive strategies to protect these vulnerable samples and enhance the privacy-preserving capabilities of ML models. Our code is available at https://github.com/najeebjebreel/mia_analysis.

Membership Inference Attacks Beyond Overfitting

TL;DR

This work tackles the privacy risk of membership inference attacks (MIAs) beyond the classical overfitting narrative, showing that even well-generalizing models can leak information about a subset of training samples. It combines visual (t-SNE) and explanation (Grad-CAM) techniques to identify that vulnerable points are often boundary/outlier samples whose predictions rely on non-relevant features, and it introduces a logit-reweighting defense to protect these samples. Through experiments on Purchase100 and CIFAR-10 with several attacks and defenses, the study characterizes the utility–privacy trade-offs, finding that differential privacy offers strongest protection but at substantial performance and time costs, while regularization/dropout provide favorable compromises. The results offer practical guidance for privacy-preserving ML by highlighting pre- and post-identification strategies to shield vulnerable samples without universally sacrificing model utility.

Abstract

Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant privacy risks to individuals whose sensitive data were used for training, which motivates the use of defenses such as differential privacy, often at the cost of high accuracy losses. MIAs exploit the differences in the behavior of a model when making predictions on samples it has seen during training (members) versus those it has not seen (non-members). Several studies have pointed out that model overfitting is the major factor contributing to these differences in behavior and, consequently, to the success of MIAs. However, the literature also shows that even non-overfitted ML models can leak information about a small subset of their training data. In this paper, we investigate the root causes of membership inference vulnerabilities beyond traditional overfitting concerns and suggest targeted defenses. We empirically analyze the characteristics of the training data samples vulnerable to MIAs in models that are not overfitted (and hence able to generalize). Our findings reveal that these samples are often outliers within their classes (e.g., noisy or hard to classify). We then propose potential defensive strategies to protect these vulnerable samples and enhance the privacy-preserving capabilities of ML models. Our code is available at https://github.com/najeebjebreel/mia_analysis.

Paper Structure

This paper contains 18 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Impact of overfitting in CIFAR10-DenseNet. Distributions of scaled logits for member and non-member data points, accuracy metrics, and MIA metrics for several epochs.
  • Figure 2: t-SNE visualization of vulnerable samples (circled red) w.r.t their class samples
  • Figure 3: Visualization of protected and vulnerable samples and their explanations