Table of Contents
Fetching ...

Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios

Sangyeon Yoon, Wonje Jeung, Albert No

TL;DR

This paper tackles the challenge of evaluating privacy in final-model DP-SGD settings where empirical lower bounds often undercut theoretical guarantees. It introduces loss-based input-space auditing to construct worst-case adversarial samples, avoiding reliance on canaries and leveraging the final model weights to maximize distinguishability between neighboring datasets. By formulating and optimizing distance-based and distribution-based loss objectives, the approach yields substantially tighter empirical bounds, demonstrated on MNIST with notable improvements at $\varepsilon=10.0$ (e.g., $\varepsilon_{emp}$ up to $4.914$). The method uses data-splitting and $\mu$-GDP conversion to produce robust, practical privacy auditing results suitable for open-source and API-accessible models, with potential applicability to larger datasets and non-convex settings.

Abstract

Auditing Differentially Private Stochastic Gradient Descent (DP-SGD) in the final model setting is challenging and often results in empirical lower bounds that are significantly looser than theoretical privacy guarantees. We introduce a novel auditing method that achieves tighter empirical lower bounds without additional assumptions by crafting worst-case adversarial samples through loss-based input-space auditing. Our approach surpasses traditional canary-based heuristics and is effective in final model-only scenarios. Specifically, with a theoretical privacy budget of $\varepsilon = 10.0$, our method achieves empirical lower bounds of $4.914$, compared to the baseline of $4.385$ for MNIST. Our work offers a practical framework for reliable and accurate privacy auditing in differentially private machine learning.

Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios

TL;DR

This paper tackles the challenge of evaluating privacy in final-model DP-SGD settings where empirical lower bounds often undercut theoretical guarantees. It introduces loss-based input-space auditing to construct worst-case adversarial samples, avoiding reliance on canaries and leveraging the final model weights to maximize distinguishability between neighboring datasets. By formulating and optimizing distance-based and distribution-based loss objectives, the approach yields substantially tighter empirical bounds, demonstrated on MNIST with notable improvements at (e.g., up to ). The method uses data-splitting and -GDP conversion to produce robust, practical privacy auditing results suitable for open-source and API-accessible models, with potential applicability to larger datasets and non-convex settings.

Abstract

Auditing Differentially Private Stochastic Gradient Descent (DP-SGD) in the final model setting is challenging and often results in empirical lower bounds that are significantly looser than theoretical privacy guarantees. We introduce a novel auditing method that achieves tighter empirical lower bounds without additional assumptions by crafting worst-case adversarial samples through loss-based input-space auditing. Our approach surpasses traditional canary-based heuristics and is effective in final model-only scenarios. Specifically, with a theoretical privacy budget of , our method achieves empirical lower bounds of , compared to the baseline of for MNIST. Our work offers a practical framework for reliable and accurate privacy auditing in differentially private machine learning.

Paper Structure

This paper contains 16 sections, 1 theorem, 7 equations, 3 figures, 1 table, 2 algorithms.

Key Result

Corollary 1

A mechanism is $\mu$-GDP if and only if it is $(\varepsilon, \delta(\varepsilon))$-DP for all $\varepsilon \geq 0$, where: $\Phi$ is the standard normal CDF.

Figures (3)

  • Figure 1: Overview of privacy auditing process. The Crafter crafts a canary and constructs two neighboring datasets, $D$ and $D'$, where $D'$ contains the canary. The Trainer trains a model on either $D$ or $D'$ using the DP-SGD. The Distinguisher observes the loss values for specific input to infer whether the model was trained on $D$ or $D'$. We suggests an adversarial sample for tighter auditing.
  • Figure 2: The loss distributions of the canary sample $a_c$ and three adversarial samples $(a_{L_2}, a_{F}, a_{BD})$ in the training set at a fixed privacy budget $\varepsilon = 1.0$. The green distribution represents the loss outputs of models $M$, while the blue distribution represents the loss outputs of models $M'$.
  • Figure 3: Privacy auditing results on the MNIST dataset using $\{128, 256, 384, 512\}$ models.

Theorems & Definitions (1)

  • Corollary 1: $\mu$-GDP to $(\varepsilon, \delta)$-DP conversion dong2022gaussian