PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Mishaal Kazmi; Hadrien Lautraite; Alireza Akbari; Qiaoyue Tang; Mauricio Soroco; Tao Wang; Sébastien Gambs; Mathias Lécuyer

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Mishaal Kazmi, Hadrien Lautraite, Alireza Akbari, Qiaoyue Tang, Mauricio Soroco, Tao Wang, Sébastien Gambs, Mathias Lécuyer

TL;DR

PANORAMIA is presented, a privacy leakage measurement framework for machine learning models that relies on membership inference attacks using generated data as non-members to eliminate the common dependency of privacy measurement tools on in-distribution non-member data.

Abstract

We present PANORAMIA, a privacy leakage measurement framework for machine learning models that relies on membership inference attacks using generated data as non-members. By relying on generated non-member data, PANORAMIA eliminates the common dependency of privacy measurement tools on in-distribution non-member data. As a result, PANORAMIA does not modify the model, training data, or training process, and only requires access to a subset of the training data. We evaluate PANORAMIA on ML models for image and tabular data classification, as well as on large-scale language models.

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

TL;DR

Abstract

Paper Structure (36 sections, 11 theorems, 58 equations, 16 figures, 11 tables, 1 algorithm)

This paper contains 36 sections, 11 theorems, 58 equations, 16 figures, 11 tables, 1 algorithm.

Introduction
Background and Related Work
PANORAMIA
Quantifying Privacy Leakage with PANORAMIA
Formalizing the Privacy Measurements as a Hypothesis Test
Quantifying Privacy Leakage and Interpretation
Evaluation
Baseline Design and Evaluation
Main Privacy Measurement Results
Leveraging More Data to Improve Privacy Measurements
Impact, Limitations and Future Directions
Notations
Proofs
Proof of Proposition \ref{['prop:gen-test']}
Proof of Proposition \ref{['prop:dp-test']}
...and 21 more sections

Key Result

Proposition 1

Let ${\mathcal{G}}$ be $c$-close, $S$ and $X$ be the random variables for $s$ and $x$ from def:auditing-game, and $T^b \triangleq B(S, X)$ be the vector of guesses from the baseline. Then, for all $v \in {\mathbb{R}}$ and all $t$ in the support of $T$:

Figures (16)

Figure 1: PANORAMIA's two-phase privacy audit. Phase 1 trains generative model ${\mathcal{G}}$ on member data. Phase 2 trains a MIA on a subset of member data and generated non-member data, using the loss of $f$ on these data points. The performance of the MIA is compared to a baseline classifier that does not have access to $f$. Notations are summarized in Table \ref{['table:notations']} in \ref{['appendix:notations']}.
Figure 2: Baseline evaluation with different helper model scenarios.
Figure 3: Precision vs. recall comparison between PANORAMIA and the baseline $b$ for our target models, based on one experiment run. The maximum $c_{\textrm{lb}}$ or $\{c\!+\!\epsilon\}_{\textrm{lb}}$ values set an upper bound on the empirical precision values across different recall levels, indicated by the dashed line.
Figure 4: $\{c+\epsilon\}_{\textnormal{lb}}$ (or $c_{\textrm{lb}}$) vs recall, for our target models, reported over $5$ independent experiment runs.
Figure 5: ResNet18, CIFAR-10, $\epsilon$-DP for various $\epsilon$ values.
...and 11 more figures

Theorems & Definitions (27)

Definition 1: Differential Privacy dwork2006calibrating
Definition 2: Privacy game
Definition 3: $c$-closeness
Proposition 1
proof
proof
Proposition 2
proof
proof
Corollary 1
...and 17 more

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

TL;DR

Abstract

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (27)