Table of Contents
Fetching ...

LOGAN: Membership Inference Attacks Against Generative Models

Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro

TL;DR

The paper introduces the first membership inference attacks against generative models, deploying white-box and black-box GAN-based strategies to determine whether a data record was in training across DCGAN, BEGAN, and DCGAN+VAE on datasets like LFW, CIFAR-10, and Diabetic Retinopathy. White-box attacks show near-perfect leakage on several targets, while black-box attacks achieve substantial leakage that improves with auxiliary knowledge, though often still below white-box levels. Defenses such as weight normalization, dropout, and differential privacy provide limited protection, sometimes at the cost of sample quality or training stability; BEGAN models tend to generalize better and leak less. The work highlights practical privacy risks in generative models, offers a framework for evaluating privacy leakage, and suggests that leakage correlates with generalization performance, motivating future defenses and broader evaluations.

Abstract

Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator's capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.

LOGAN: Membership Inference Attacks Against Generative Models

TL;DR

The paper introduces the first membership inference attacks against generative models, deploying white-box and black-box GAN-based strategies to determine whether a data record was in training across DCGAN, BEGAN, and DCGAN+VAE on datasets like LFW, CIFAR-10, and Diabetic Retinopathy. White-box attacks show near-perfect leakage on several targets, while black-box attacks achieve substantial leakage that improves with auxiliary knowledge, though often still below white-box levels. Defenses such as weight normalization, dropout, and differential privacy provide limited protection, sometimes at the cost of sample quality or training stability; BEGAN models tend to generalize better and leak less. The work highlights practical privacy risks in generative models, offers a framework for evaluating privacy leakage, and suggests that leakage correlates with generalization performance, motivating future defenses and broader evaluations.

Abstract

Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator's capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.

Paper Structure

This paper contains 29 sections, 1 equation, 24 figures, 1 table.

Figures (24)

  • Figure 1: Generative Adversarial Network (GAN).
  • Figure 2: High-level Outline of the White-Box Attack.
  • Figure 3: White-Box Prediction Method: The attacker inputs data-points to the Discriminator $D$ (1), extracts the output probabilities (2), and sorts them (3).
  • Figure 4: High-level overview of the (a) black-box attack with no auxiliary knowledge, and (b) Discriminative and (c) Generative black-box attack with limited auxiliary attacker knowledge.
  • Figure 5: Accuracy of white-box attack with different datasets and training sets.
  • ...and 19 more figures