Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

Yingyu Lin; Yi-An Ma; Yu-Xiang Wang; Rachel Redberg; Zhiqi Bu

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

Yingyu Lin, Yi-An Ma, Yu-Xiang Wang, Rachel Redberg, Zhiqi Bu

TL;DR

This paper addresses the gap between theory and practice in private learning by introducing Approximate Sample Perturbation (ASAP), an MCMC-based method that preserves pure differential privacy while sampling from an approximate posterior. ASAP perturbs an MCMC sample in proportion to its $W_{\infty}$ distance to a DP reference, and leverages a constrained Metropolis-adjusted Langevin algorithm (MALA) to achieve convergence guarantees in $W_{\infty}$ distance. A key technical contribution is a TV-to-$W_{\infty}$ conversion lemma that enables pure DP guarantees for approximate samplers, enabling end-to-end localization to a bounded domain. The end-to-end localized ASAP yields near-linear-time DP-ERM with optimal rates for strongly convex and smooth losses under both pure DP and Gaussian DP, representing the first such result in this regime. The approach provides a practical bridge between private Bayesian learning and computational efficiency, with potential applicability beyond DP-ERM where DP-preserving sampling is required.

Abstract

Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,δ)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $δ$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $δ=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in $W_\infty$ distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

TL;DR

distance to a DP reference, and leverages a constrained Metropolis-adjusted Langevin algorithm (MALA) to achieve convergence guarantees in

distance. A key technical contribution is a TV-to-

conversion lemma that enables pure DP guarantees for approximate samplers, enabling end-to-end localization to a bounded domain. The end-to-end localized ASAP yields near-linear-time DP-ERM with optimal rates for strongly convex and smooth losses under both pure DP and Gaussian DP, representing the first such result in this regime. The approach provides a practical bridge between private Bayesian learning and computational efficiency, with potential applicability beyond DP-ERM where DP-preserving sampling is required.

Abstract

Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides

-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by

-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing

-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity (

) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e.,

). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in

distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.

Paper Structure (40 sections, 21 theorems, 32 equations, 3 figures, 2 tables, 4 algorithms)

This paper contains 40 sections, 21 theorems, 32 equations, 3 figures, 2 tables, 4 algorithms.

Introduction
Our Contributions
Related Work
Problem Setup and Preliminaries
Differential Privacy Empirical Risk Minimization (DP-ERM)
Differential Privacy Definitions
Exact Posterior Sampling: DP and Utility Guarantees
Technical Tools
Overview: Why do we use $W_\infty$ distance?
TV Distance to $W_{\infty}$ Distance
MALA with Constraint
Notations.
Main Results: Approximate Sample Perturbation (ASAP)
Approximate Sample Perturbation (ASAP)
Localized ASAP and the End-to-End Guarantees
...and 25 more sections

Key Result

Lemma 4

Assume the loss function is $G$-Lipschitz, posterior sampling mechanism with parameter $\gamma, \lambda > 0$ satisfying $\gamma \leq \mu^2 \lambda / G^2$ satisfies $\mu$-GDP.

Figures (3)

Figure 1: Two examples illustrating the couplings of $\Tilde{p}$ and $p^*$. Let $\zeta^*$ be the optimal coupling of $W_{\infty}(\Tilde{p}, p^*)$, and let $\Tilde{p}\otimes p^*$ denote the independent coupling. In both scenarios, the marginal distributions are $\Tilde{p}$ and $p^*$. Denote $\Delta=W_{\infty}(\Tilde{p}, p^*)$. In Figure (a), when $(\Tilde{\theta},\theta^*)$ follows the optimal coupling, $(\Tilde{\theta},\theta^*)$ is confined within the band $|\Tilde{\theta}-\theta^*|\leq \Delta$. Conversely, Figure (b) shows that when $\Tilde{\theta}$ and $\theta^*$ are independently sampled, the distance $|\Tilde{\theta}-\theta^*|$ can take relatively large values with positive probability. Through the appropriate coupling of the distributions $\Tilde{p}$ and $p^*$, particularly via the optimal coupling $\zeta^*$, we obtain a tight almost-sure bound $\Delta$ on the distance between the two samples $\Tilde{\theta}$ and $\theta^*$.
Figure 2: Excess empirical risks from \ref{['tab:summary_strongconvex']} for strongly convex losses. Here $d=11, G=300, \alpha=4,\varepsilon=1.$ Left $n=1e4$. Right $n=1e6$.
Figure 3: Excess empirical risks for strongly convex losses on Wine Quality -- Red dataset.

Theorems & Definitions (31)

Definition 1: Differential privacy dwork2006calibratingdwork2014algorithmic
Definition 2: Hockey-Stick Divergence
Definition 3: Gaussian Differential Privacy dong2022gaussian
Lemma 4: GDP of posterior sampling gopi2022private
Lemma 5: Pure DP of posterior sampling
Lemma 6: deklerk2018comparison
Definition 7
Lemma 8: Converting TV distance to $W_\infty$ distance
Lemma 9
Theorem 1: Mixing time in TV distance
...and 21 more

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

TL;DR

Abstract

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)