Table of Contents
Fetching ...

Understanding Membership Inferences on Well-Generalized Learning Models

Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, Kai Chen

TL;DR

GMIA shows that membership inference can succeed even on well-generalized models, challenging the notion that overfitting is the primary privacy risk. The approach uses reference models and a vulnerable-record selection mechanism to detect small, unique influences of training records, enabling direct and indirect inferences without querying the target record in some cases. The findings demonstrate that traditional generalization-based defenses (regularization) are insufficient alone and highlight the need for data selection and differential privacy to mitigate leakage while preserving utility. Overall, the work illuminates a fundamental privacy-utility tension and proposes a framework for assessing and addressing membership leakage in practical MLaaS deployments.

Abstract

Membership Inference Attack (MIA) determines the presence of a record in a machine learning model's training data by querying the model. Prior work has shown that the attack is feasible when the model is overfitted to its training data or when the adversary controls the training algorithm. However, when the model is not overfitted and the adversary does not control the training algorithm, the threat is not well understood. In this paper, we report a study that discovers overfitting to be a sufficient but not a necessary condition for an MIA to succeed. More specifically, we demonstrate that even a well-generalized model contains vulnerable instances subject to a new generalized MIA (GMIA). In GMIA, we use novel techniques for selecting vulnerable instances and detecting their subtle influences ignored by overfitting metrics. Specifically, we successfully identify individual records with high precision in real-world datasets by querying black-box machine learning models. Further we show that a vulnerable record can even be indirectly attacked by querying other related records and existing generalization techniques are found to be less effective in protecting the vulnerable instances. Our findings sharpen the understanding of the fundamental cause of the problem: the unique influences the training instance may have on the model.

Understanding Membership Inferences on Well-Generalized Learning Models

TL;DR

GMIA shows that membership inference can succeed even on well-generalized models, challenging the notion that overfitting is the primary privacy risk. The approach uses reference models and a vulnerable-record selection mechanism to detect small, unique influences of training records, enabling direct and indirect inferences without querying the target record in some cases. The findings demonstrate that traditional generalization-based defenses (regularization) are insufficient alone and highlight the need for data selection and differential privacy to mitigate leakage while preserving utility. Overall, the work illuminates a fundamental privacy-utility tension and proposes a framework for assessing and addressing membership leakage in practical MLaaS deployments.

Abstract

Membership Inference Attack (MIA) determines the presence of a record in a machine learning model's training data by querying the model. Prior work has shown that the attack is feasible when the model is overfitted to its training data or when the adversary controls the training algorithm. However, when the model is not overfitted and the adversary does not control the training algorithm, the threat is not well understood. In this paper, we report a study that discovers overfitting to be a sufficient but not a necessary condition for an MIA to succeed. More specifically, we demonstrate that even a well-generalized model contains vulnerable instances subject to a new generalized MIA (GMIA). In GMIA, we use novel techniques for selecting vulnerable instances and detecting their subtle influences ignored by overfitting metrics. Specifically, we successfully identify individual records with high precision in real-world datasets by querying black-box machine learning models. Further we show that a vulnerable record can even be indirectly attacked by querying other related records and existing generalization techniques are found to be less effective in protecting the vulnerable instances. Our findings sharpen the understanding of the fundamental cause of the problem: the unique influences the training instance may have on the model.

Paper Structure

This paper contains 29 sections, 4 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: Understanding the unique influence of a record through a toy example dataset (a). The adversary performs MIA by fingerprinting the target record's influence on the model's outputs (predicted class probabilities). There are two competing hypotheses: (1) $H_{\rm in}$: record is part of the training data, and (2) $H_{\rm out}$: record is not part of the training data. The adversary infers membership status by estimating which hypothesis is more likely based on the model's outputs.
  • Figure 2: Attack Overview
  • Figure 3: Last layer output of a two-layer neural network
  • Figure 4: Generate new features for vulnerable record selection
  • Figure 5: Steps for generating enhancing records.
  • ...and 9 more figures