Table of Contents
Fetching ...

Do Parameters Reveal More than Loss for Membership Inference?

Anshuman Suri, Xiao Zhang, David Evans

TL;DR

This paper shows that black-box membership inference is generally insufficient for models trained with stochastic gradient descent and that optimal leakage requires access to model parameters. It derives a theoretical framework linking SGD dynamics, the Hessian, and the posterior over parameters to MI, and introduces the Inverse Hessian Attack (IHA) as a white-box auditing method that scores records using inverse-Hessian-vector products and gradient signals. Empirically, IHA matches or surpasses state-of-the-art reference-model attacks on several datasets, highlights the critical role of Hessian-informed terms, and demonstrates that IHA can audit privacy leakage without training reference models, albeit with notable computational costs. The findings advocate for broader investigation of white-box MI methods, careful consideration of Hessian structure, and practical damping/approximation strategies to enable scalable privacy auditing in real-world systems.

Abstract

Membership inference attacks are used as a key tool for disclosure auditing. They aim to infer whether an individual record was used to train a model. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for stochastic gradient descent, and that optimal membership inference indeed requires white-box access. Our theoretical results lead to a new white-box inference attack, IHA (Inverse Hessian Attack), that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both auditors and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership inference.

Do Parameters Reveal More than Loss for Membership Inference?

TL;DR

This paper shows that black-box membership inference is generally insufficient for models trained with stochastic gradient descent and that optimal leakage requires access to model parameters. It derives a theoretical framework linking SGD dynamics, the Hessian, and the posterior over parameters to MI, and introduces the Inverse Hessian Attack (IHA) as a white-box auditing method that scores records using inverse-Hessian-vector products and gradient signals. Empirically, IHA matches or surpasses state-of-the-art reference-model attacks on several datasets, highlights the critical role of Hessian-informed terms, and demonstrates that IHA can audit privacy leakage without training reference models, albeit with notable computational costs. The findings advocate for broader investigation of white-box MI methods, careful consideration of Hessian structure, and practical damping/approximation strategies to enable scalable privacy auditing in real-world systems.

Abstract

Membership inference attacks are used as a key tool for disclosure auditing. They aim to infer whether an individual record was used to train a model. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for stochastic gradient descent, and that optimal membership inference indeed requires white-box access. Our theoretical results lead to a new white-box inference attack, IHA (Inverse Hessian Attack), that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both auditors and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership inference.
Paper Structure (29 sections, 5 theorems, 35 equations, 1 figure, 9 tables)

This paper contains 29 sections, 5 theorems, 35 equations, 1 figure, 9 tables.

Key Result

Lemma 2.1

Let $\mathcal{T} = \{\bm{z}_2,\ldots,\bm{z}_n, m_2, \ldots, m_n\}$. Given model parameters $\bm{w}$ and a record $\bm{z}_1$, the optimal membership inference is given by: where $\sigma(u) = (1+\exp(-u))^{-1}$ is the Sigmoid function, and $\gamma = \mathbb{P}(m_1 = 1)$.

Figures (1)

  • Figure 1: ROC curves for low-FPR region for various attacks and datasets.

Theorems & Definitions (6)

  • Definition 2.1: Membership Inference
  • Lemma 2.1: sablayrolles_white-box_2019
  • Theorem 2.2: SGD Stationary distribution with momentum
  • Theorem 2.3: SGD Noise Covariance
  • Theorem 3.1: Posterior for SGD
  • Theorem 3.2: Optimal Membership-Inference Score