Table of Contents
Fetching ...

Efficient Membership Inference Attacks by Bayesian Neural Network

Zhenlong Liu, Wenyu Jiang, Feng Zhou, Hongxin Wei

TL;DR

This work tackles privacy risks in supervised learning by addressing Membership Inference Attacks (MIAs) through a Bayes-aware lens. It introduces Bayesian Membership Inference Attack (BMIA), which converts a single trained reference model into a Bayesian neural network via Laplace approximation to obtain a predictive distribution that captures both epistemic and aleatoric uncertainty, enabling a conditional, per-example attack without training multiple shadow models. The approach achieves state-of-the-art or competitive results across five datasets (Texas100, Purchase100, CIFAR-10/100, ImageNet) with significantly reduced computational cost, demonstrated by substantial TPR gains at very low FPRs and cheaper training time. The paper also provides theoretical insights into why conditional attacks outperform marginal ones and analyzes the impact of sample size and Hessian factorization on performance, with robust behavior under model mismatch and OOD conditions. Overall, BMIA offers an efficient and robust toolkit for auditing privacy risks in neural networks by leveraging last-layer Laplace inference to quantify conditional score distributions.

Abstract

Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Previous attacks often utilize multiple reference models to approximate the conditional score distribution, leading to significant computational overhead. While recent work leverages quantile regression to estimate conditional thresholds, it fails to capture epistemic uncertainty, resulting in bias in low-density regions. In this work, we propose a novel approach - Bayesian Membership Inference Attack (BMIA), which performs conditional attack through Bayesian inference. In particular, we transform a trained reference model into Bayesian neural networks by Laplace approximation, enabling the direct estimation of the conditional score distribution by probabilistic model parameters. Our method addresses both epistemic and aleatoric uncertainty with only a reference model, enabling efficient and powerful MIA. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of BMIA.

Efficient Membership Inference Attacks by Bayesian Neural Network

TL;DR

This work tackles privacy risks in supervised learning by addressing Membership Inference Attacks (MIAs) through a Bayes-aware lens. It introduces Bayesian Membership Inference Attack (BMIA), which converts a single trained reference model into a Bayesian neural network via Laplace approximation to obtain a predictive distribution that captures both epistemic and aleatoric uncertainty, enabling a conditional, per-example attack without training multiple shadow models. The approach achieves state-of-the-art or competitive results across five datasets (Texas100, Purchase100, CIFAR-10/100, ImageNet) with significantly reduced computational cost, demonstrated by substantial TPR gains at very low FPRs and cheaper training time. The paper also provides theoretical insights into why conditional attacks outperform marginal ones and analyzes the impact of sample size and Hessian factorization on performance, with robust behavior under model mismatch and OOD conditions. Overall, BMIA offers an efficient and robust toolkit for auditing privacy risks in neural networks by leveraging last-layer Laplace inference to quantify conditional score distributions.

Abstract

Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Previous attacks often utilize multiple reference models to approximate the conditional score distribution, leading to significant computational overhead. While recent work leverages quantile regression to estimate conditional thresholds, it fails to capture epistemic uncertainty, resulting in bias in low-density regions. In this work, we propose a novel approach - Bayesian Membership Inference Attack (BMIA), which performs conditional attack through Bayesian inference. In particular, we transform a trained reference model into Bayesian neural networks by Laplace approximation, enabling the direct estimation of the conditional score distribution by probabilistic model parameters. Our method addresses both epistemic and aleatoric uncertainty with only a reference model, enabling efficient and powerful MIA. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of BMIA.

Paper Structure

This paper contains 35 sections, 2 theorems, 27 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Proposition 4.1

Suppose $P$ and $Q$ follow the normal distribution such that $P \sim \mathcal{N}(\mu_\mathcal{S},\sigma_{\mathcal{S}}^2)$ and $Q \sim \mathcal{N}(\mu_\mathcal{D},\sigma_{\mathcal{D}}^2)$. Then the true positive rate(TPR) at the lower FPR $\alpha$ is: where $\Phi(\cdot)$ is the cumulative distribution function of standard normal distribution.

Figures (7)

  • Figure 1: Results on a toy regression task with non-uniform distribution data (gray points). The shaded regions represent 90% prediction intervals estimated by BNN and quantile regression.
  • Figure 2: Conditional probability distribution function for a non-member sample with a higher hinge score over the non-member world, as estimated by our attack BMIA, LiRa carlini2022membership, QMIA bertran2024scalable, and the marginal attack ye2022enhanced. Dashed lines are the estimated threshold $\tau_{\alpha}$ ($\alpha = 5\%$) defined in \ref{['metric-based', 'per_example']}. The target score refers to the score obtained from the target model for this non-member sample.
  • Figure 3: Trends of TPRs at 0.1% and 1% FPR, along with inference time, on CIFAR-10/100 across different sample sizes and Hessian factorization methods.
  • Figure 4: TPR at 1% FPR for the scenario where the reference model may be mismatched with the target model; all models are trained on CIFAR-10.
  • Figure 5: Attack performance comparison of variations of Hessian approximation on CIFAR-10.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Proposition 4.1
  • Proposition 4.2
  • proof
  • proof