Table of Contents
Fetching ...

Learning with Differentially Private (Sliced) Wasserstein Gradients

David Rodríguez-Vítores, Clément Lalanne, Jean-Michel Loubes

TL;DR

This work tackles private learning for objectives based on Wasserstein distances between empirical distributions. It derives a fully discrete, tractable gradient formulation and a sharp sensitivity bound, enabling Gaussian mechanisms to privately estimate Wasserstein gradients with controlled utility loss. The authors develop a deep-learning DP framework with clipping and privacy accounting, and demonstrate two key applications: privately training Sliced Wasserstein Autoencoders and private in-processing for fairness. Empirical results on image datasets show that the proposed approach maintains accuracy while providing strong privacy guarantees, and the framework also yields privacy-preserving generative capabilities. Overall, the method broadens private distributional learning where optimal transport distances guide the objective, with potential impact on privacy-preserving representation learning and fair ML practice.

Abstract

In this work, we introduce a novel framework for privately optimizing objectives that rely on Wasserstein distances between data-dependent empirical measures. Our main theoretical contribution is, based on an explicit formulation of the Wasserstein gradient in a fully discrete setting, a control on the sensitivity of this gradient to individual data points, allowing strong privacy guarantees at minimal utility cost. Building on these insights, we develop a deep learning approach that incorporates gradient and activations clipping, originally designed for DP training of problems with a finite-sum structure. We further demonstrate that privacy accounting methods extend to Wasserstein-based objectives, facilitating large-scale private training. Empirical results confirm that our framework effectively balances accuracy and privacy, offering a theoretically sound solution for privacy-preserving machine learning tasks relying on optimal transport distances such as Wasserstein distance or sliced-Wasserstein distance.

Learning with Differentially Private (Sliced) Wasserstein Gradients

TL;DR

This work tackles private learning for objectives based on Wasserstein distances between empirical distributions. It derives a fully discrete, tractable gradient formulation and a sharp sensitivity bound, enabling Gaussian mechanisms to privately estimate Wasserstein gradients with controlled utility loss. The authors develop a deep-learning DP framework with clipping and privacy accounting, and demonstrate two key applications: privately training Sliced Wasserstein Autoencoders and private in-processing for fairness. Empirical results on image datasets show that the proposed approach maintains accuracy while providing strong privacy guarantees, and the framework also yields privacy-preserving generative capabilities. Overall, the method broadens private distributional learning where optimal transport distances guide the objective, with potential impact on privacy-preserving representation learning and fair ML practice.

Abstract

In this work, we introduce a novel framework for privately optimizing objectives that rely on Wasserstein distances between data-dependent empirical measures. Our main theoretical contribution is, based on an explicit formulation of the Wasserstein gradient in a fully discrete setting, a control on the sensitivity of this gradient to individual data points, allowing strong privacy guarantees at minimal utility cost. Building on these insights, we develop a deep learning approach that incorporates gradient and activations clipping, originally designed for DP training of problems with a finite-sum structure. We further demonstrate that privacy accounting methods extend to Wasserstein-based objectives, facilitating large-scale private training. Empirical results confirm that our framework effectively balances accuracy and privacy, offering a theoretically sound solution for privacy-preserving machine learning tasks relying on optimal transport distances such as Wasserstein distance or sliced-Wasserstein distance.

Paper Structure

This paper contains 42 sections, 7 theorems, 43 equations, 19 figures, 2 tables, 1 algorithm.

Key Result

Lemma 2.3

Given a deterministic function $h$ mapping a dataset to a quantity in $\mathbb{R}^{d}$, one can define the $l_2$-sensitivity of $h$ as When this quantity is finite, for any $\sigma > 0$, the Gaussian mechanism defined as $\mathbf{D} \mapsto h(\mathbf{D}) + \sigma \mathcal{N}(0, I_{d}) \;,$ is $(\varepsilon, \delta(\varepsilon))$-DP for any $\varepsilon \geq 0$ where, by noting $\mu = \frac{\Delta

Figures (19)

  • Figure 1: Sliced Wasserstein autoencoder with privacy-aware training. The encoder maps input $\mathbf{x}_i$ to a latent code $\varphi_\theta(x_i)$, encouraged to match the prior $P_{\mathbf{Z}}$. The decoder reconstructs $\hat{\mathbf{x}}_i$. The loss combines reconstruction and sliced Wasserstein distance in latent space.
  • Figure 2: Benchmarking the capabilities of a $(\varepsilon,\delta)$-DP sliced Wasserstein autoencoder on MNIST with $\delta = 10^{-5}$ and varying $\varepsilon$. Top: Reconstructed digits from the test dataset. The first row shows the earliest sample of each digit (0–9) in the test set, followed by reconstructions. Bottom: Generated samples from the same model by decoding noise from the latent space.
  • Figure 3: Encoded latent space MNIST samples for the autoencoder, trained under $(\varepsilon,\delta)$-DP, with $\delta = 10^{-5}$ and varying values of $\varepsilon$.
  • Figure 4: Distributions of $\varphi_\theta(X)$ conditioned on the sensitive attribute $A = 0$ and $A = 1$, for models trained to minimize the objective in \ref{['eq:loss_SP_text']}, across different values of the regularization parameter $\alpha$ and the privacy budget $\varepsilon$. The representation $\varphi_\theta$ corresponds to: (i) predicted class probabilities in a classification task, (ii) predicted values in a bidimensional regression task, and (iii) bidimensional latent representations from an autoencoder.
  • Figure 5: Reconstructed FASHION digits from the test dataset. The first row shows the earliest sample of each digit (0–9) in the test set. Subsequent rows display the corresponding reconstructions produced by the trained autoencoder under $(\varepsilon,\delta)$-DP, with $\delta = 10^{-5}$ and varying values of $\varepsilon$.
  • ...and 14 more figures

Theorems & Definitions (20)

  • Definition 2.1
  • Definition 2.2: $(\varepsilon, \delta)$-DP dwork2006our
  • Lemma 2.3: Privacy of the Gaussian mechanism (Corollary of Theorem 2.7, Corollary 3.3 and Corollary 2.13 in dong2019gaussian)
  • Proposition 3.1
  • Proposition 3.2
  • Theorem 4.1
  • Remark 4.2: Extension to arbitrary dimension
  • Remark 4.3: About the assumptions
  • Remark 4.4: General optimizers and privacy accounting
  • Remark 4.5: Computational complexity
  • ...and 10 more