Table of Contents
Fetching ...

Revisiting the LiRA Membership Inference Attack Under Realistic Assumptions

Najeeb Jebreel, Mona Khalil, David Sánchez, Josep Domingo-Ferrer

TL;DR

It is suggested that LiRA, and likely weaker MIAs, are less effective than previously suggested under realistic conditions, and that reliable privacy auditing requires evaluation protocols that reflect practical training practices, feasible attacker assumptions, and reproducibility considerations.

Abstract

Membership inference attacks (MIAs) have become the standard tool for evaluating privacy leakage in machine learning (ML). Among them, the Likelihood-Ratio Attack (LiRA) is widely regarded as the state of the art when sufficient shadow models are available. However, prior evaluations have often overstated the effectiveness of LiRA by attacking models overconfident on their training samples, calibrating thresholds on target data, assuming balanced membership priors, and/or overlooking attack reproducibility. We re-evaluate LiRA under a realistic protocol that (i) trains models using anti-overfitting (AOF) and transfer learning (TL), when applicable, to reduce overconfidence as in production models; (ii) calibrates decision thresholds using shadow models and data rather than target data; (iii) measures positive predictive value (PPV, or precision) under shadow-based thresholds and skewed membership priors (pi <= 10%); and (iv) quantifies per-sample membership reproducibility across different seeds and training variations. We find that AOF significantly weakens LiRA, while TL further reduces attack effectiveness while improving model accuracy. Under shadow-based thresholds and skewed priors, LiRA's PPV often drops substantially, especially under AOF or AOF+TL. We also find that thresholded vulnerable sets at extremely low FPR show poor reproducibility across runs, while likelihood-ratio rankings are more stable. These results suggest that LiRA, and likely weaker MIAs, are less effective than previously suggested under realistic conditions, and that reliable privacy auditing requires evaluation protocols that reflect practical training practices, feasible attacker assumptions, and reproducibility considerations. Code is available at https://github.com/najeebjebreel/lira_analysis.

Revisiting the LiRA Membership Inference Attack Under Realistic Assumptions

TL;DR

It is suggested that LiRA, and likely weaker MIAs, are less effective than previously suggested under realistic conditions, and that reliable privacy auditing requires evaluation protocols that reflect practical training practices, feasible attacker assumptions, and reproducibility considerations.

Abstract

Membership inference attacks (MIAs) have become the standard tool for evaluating privacy leakage in machine learning (ML). Among them, the Likelihood-Ratio Attack (LiRA) is widely regarded as the state of the art when sufficient shadow models are available. However, prior evaluations have often overstated the effectiveness of LiRA by attacking models overconfident on their training samples, calibrating thresholds on target data, assuming balanced membership priors, and/or overlooking attack reproducibility. We re-evaluate LiRA under a realistic protocol that (i) trains models using anti-overfitting (AOF) and transfer learning (TL), when applicable, to reduce overconfidence as in production models; (ii) calibrates decision thresholds using shadow models and data rather than target data; (iii) measures positive predictive value (PPV, or precision) under shadow-based thresholds and skewed membership priors (pi <= 10%); and (iv) quantifies per-sample membership reproducibility across different seeds and training variations. We find that AOF significantly weakens LiRA, while TL further reduces attack effectiveness while improving model accuracy. Under shadow-based thresholds and skewed priors, LiRA's PPV often drops substantially, especially under AOF or AOF+TL. We also find that thresholded vulnerable sets at extremely low FPR show poor reproducibility across runs, while likelihood-ratio rankings are more stable. These results suggest that LiRA, and likely weaker MIAs, are less effective than previously suggested under realistic conditions, and that reliable privacy auditing requires evaluation protocols that reflect practical training practices, feasible attacker assumptions, and reproducibility considerations. Code is available at https://github.com/najeebjebreel/lira_analysis.
Paper Structure (64 sections, 15 figures, 15 tables)

This paper contains 64 sections, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Distributions of decision thresholds of the online LiRA variant at nominal FPRs of $0.001\%\,(10^{-5})$ and $0.1\%\,(10^{-3})$ on CIFAR-10 (ResNet-18, AOF). Boxes show interquartile ranges with medians and relative median absolute deviations (rMAD) from 256 shadow models of a single run (a) and 12 independent runs (b). Threshold variability is substantially higher at $0.001\%$ than at $0.1\%$, indicating unstable calibration at the extreme tail required for high inference precision.
  • Figure 2: Reproducibility, stability, and coverage vs seeds, training variations, and runs (TP$\geq$1).
  • Figure 3: Reproducibility, stability, and coverage of zero-FP LiRA positives across runs (identical settings). Rows: number of runs combined ($k$). Columns: within-run support threshold $x$ (TP$\ge x$ among 128 IN shadows), all at FP$=0$. Modest support ($x\in[2,5]$) improves reproducibility, while very strong support ($x\ge20$) reduces both stability and coverage.
  • Figure 4: Top-9 most vulnerable samples in 10 different runs. Upper row and leftmost run in the lower row: six independent runs (identical settings, different seeds). Rest of runs in the lower row, respectively: one run with a slight change in batch size and dropout, one run using MixUp instead of CutMix together with a modest dropout increase, one run using a different but similar architecture (ResNet18 vs. WideResNet28-2), and one run with transfer learning. On each image, TP denotes the number of true detections (true positives) among 128 IN shadow models in that run, while FP denotes the number of false detections among the 128 OUT shadow models (FP$=0$ for all displayed samples).
  • Figure 5: Rank displacement under run-to-run variability. Curves show the fraction of displaced samples whose percentile rank remained within the Top-$(q{+}\Delta)\%$.
  • ...and 10 more figures