Table of Contents
Fetching ...

Practical Bayes-Optimal Membership Inference Attacks

Marcus Lassila, Johan Östman, Khac-Hoang Ngo, Alexandre Graell i Amat

TL;DR

This paper tackles privacy leaks from membership inference attacks on both i.i.d. and graph-structured data by deriving Bayes-optimal MIA rules for node-level attacks on graph neural networks and introducing practical approximations. It presents G-BASE, a tractable Bayes-optimal MIA for graphs, and BASE, a scalable Bayes-optimal MIA for i.i.d. data, both achieving state-of-the-art or comparable performance to LiRA and RMIA with lower computational cost. A key theoretical contribution is the Bayes-optimal decision rule for graph data, which accounts for neighborhood influence in GNNs, plus sampling strategies to approximate intractable expectations. The work demonstrates that BASE is equivalent to RMIA under a specific threshold, yet BASE achieves similar results with far fewer model queries, and G-BASE delivers superior performance on larger graphs, making the framework a practical privacy auditing tool for contemporary graph and non-graph learning systems. Overall, the paper bridges theory and practice, providing principled, scalable MIAs for privacy auditing across data modalities.

Abstract

We develop practical and theoretically grounded membership inference attacks (MIAs) against both independent and identically distributed (i.i.d.) data and graph-structured data. Building on the Bayesian decision-theoretic framework of Sablayrolles et al., we derive the Bayes-optimal membership inference rule for node-level MIAs against graph neural networks, addressing key open questions about optimal query strategies in the graph setting. We introduce BASE and G-BASE, tractable approximations of the Bayes-optimal membership inference. G-BASE achieves superior performance compared to previously proposed classifier-based node-level MIA attacks. BASE, which is also applicable to non-graph data, matches or exceeds the performance of prior state-of-the-art MIAs, such as LiRA and RMIA, at a significantly lower computational cost. Finally, we show that BASE and RMIA are equivalent under a specific hyperparameter setting, providing a principled, Bayes-optimal justification for the RMIA attack.

Practical Bayes-Optimal Membership Inference Attacks

TL;DR

This paper tackles privacy leaks from membership inference attacks on both i.i.d. and graph-structured data by deriving Bayes-optimal MIA rules for node-level attacks on graph neural networks and introducing practical approximations. It presents G-BASE, a tractable Bayes-optimal MIA for graphs, and BASE, a scalable Bayes-optimal MIA for i.i.d. data, both achieving state-of-the-art or comparable performance to LiRA and RMIA with lower computational cost. A key theoretical contribution is the Bayes-optimal decision rule for graph data, which accounts for neighborhood influence in GNNs, plus sampling strategies to approximate intractable expectations. The work demonstrates that BASE is equivalent to RMIA under a specific threshold, yet BASE achieves similar results with far fewer model queries, and G-BASE delivers superior performance on larger graphs, making the framework a practical privacy auditing tool for contemporary graph and non-graph learning systems. Overall, the paper bridges theory and practice, providing principled, scalable MIAs for privacy auditing across data modalities.

Abstract

We develop practical and theoretically grounded membership inference attacks (MIAs) against both independent and identically distributed (i.i.d.) data and graph-structured data. Building on the Bayesian decision-theoretic framework of Sablayrolles et al., we derive the Bayes-optimal membership inference rule for node-level MIAs against graph neural networks, addressing key open questions about optimal query strategies in the graph setting. We introduce BASE and G-BASE, tractable approximations of the Bayes-optimal membership inference. G-BASE achieves superior performance compared to previously proposed classifier-based node-level MIA attacks. BASE, which is also applicable to non-graph data, matches or exceeds the performance of prior state-of-the-art MIAs, such as LiRA and RMIA, at a significantly lower computational cost. Finally, we show that BASE and RMIA are equivalent under a specific hyperparameter setting, providing a principled, Bayes-optimal justification for the RMIA attack.

Paper Structure

This paper contains 30 sections, 4 theorems, 36 equations, 7 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

Given a graph $\mathcal{G}=(\bm{X},\bm{Y},\bm{A})$ and an $L$-layer GNN model $\bm{\theta}$ trained on an induced subgraph of $\mathcal{G}$ to minimize the objective in Eq:NLL-loss-graph, the posterior probability $P(m_v=1|\bm{\theta},\mathcal{G})$ is given by where with and the latter representing the likelihood ratio prior to observing the labels. In Eq:BayesOptimalMI, $\lambda=P(m_v=1)$ den

Figures (7)

  • Figure 1: ROC curves of our attack and prior MIAs on the Flickr dataset, averaged over 10 GCN target models.
  • Figure 2: Visualization of the Bayes-optimal attack. From left to right: the challenger trains the target model $\bm{\theta}$ on $\mathcal{G}_{\mathrm{train}}$, and provides the trained model, a target node $v$, the underlying graph $\mathcal{G}$, and (optionally) auxiliary information $\mathcal{H}$ detailing the training procedure. The adversary then samples $K$ graphs and trains a corresponding set of shadow models $\{\bm{\phi}_i\}_{i=1}^K$. These sampled graphs may or may not contain the target node $v$. Finally, the adversary estimates the membership of $v$ using the Bayes-optimal decision rule in \ref{['Eq:BayesOptimalMI']}, approximated via the Monte Carlo method in \ref{['Eq:MonteCarloExpectationGraph']}.
  • Figure 3: ROC of a mismatched attack averaged over 10 independent target models. Shadow models utilize a GAT architecture and different training procedure. 8 shadow models for online; 4 for offline.
  • Figure 4: Average ROC curves (10 runs) for the Amazon-Photo dataset with GraphSAGE as target model.
  • Figure 5: Average ROC curves (10 runs) for the PubMed dataset with GCN as target model.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Definition 1
  • Theorem 1
  • proof
  • Definition 2
  • Corollary 1
  • proof
  • Definition 3
  • Definition 4
  • Theorem 2
  • proof
  • ...and 4 more