Instance-Optimal Private Density Estimation in the Wasserstein Distance

Vitaly Feldman; Audra McMillan; Satchit Sivakumar; Kunal Talwar

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Vitaly Feldman, Audra McMillan, Satchit Sivakumar, Kunal Talwar

TL;DR

The work investigates density estimation under differential privacy using the Wasserstein distance, introducing instance-optimality with tight neighborhood definitions and showing rates that adapt to how concentrated or dispersed the target distribution is. It develops a two-pronged approach: (i) a general reduction to Hierarchically Separated Trees (HSTs) enabling instance-adaptive analysis on arbitrary finite metric spaces, and (ii) a specialized, quantile-based DP method for real-valued distributions on $\mathbb{R}$. The authors establish both upper and information-theoretic lower bounds that match up to polylog factors, with explicit three-term rate decompositions capturing non-private sampling error, privacy-induced quantile-interaction costs, and tail-restriction errors. They also extend instance-optimal DP learning to two dimensions and beyond via HST embeddings, and show connections to private learning in TV distance for discrete distributions. Overall, the results provide practical, implementable algorithms that adapt to distribution structure while preserving strong privacy guarantees, representing a substantial step beyond worst-case minimax analyses in private density estimation. $W_1$ and $D_{\infty}$-based instance-optimality play central roles in the theory and guide the algorithmic design and lower-bound arguments.

Abstract

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances. For distributions $P$ over $\mathbb{R}$, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either $P$ or $Q_P$ for some distribution $Q_P$ whose probability density function (pdf) is within a factor of 2 of the pdf of $P$. For distributions over $\mathbb{R}^2$, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for $\mathbb{R}^2$ extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

Instance-Optimal Private Density Estimation in the Wasserstein Distance

TL;DR

. The authors establish both upper and information-theoretic lower bounds that match up to polylog factors, with explicit three-term rate decompositions capturing non-private sampling error, privacy-induced quantile-interaction costs, and tail-restriction errors. They also extend instance-optimal DP learning to two dimensions and beyond via HST embeddings, and show connections to private learning in TV distance for discrete distributions. Overall, the results provide practical, implementable algorithms that adapt to distribution structure while preserving strong privacy guarantees, representing a substantial step beyond worst-case minimax analyses in private density estimation.

and

-based instance-optimality play central roles in the theory and guide the algorithmic design and lower-bound arguments.

Abstract

over

, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either

for some distribution

whose probability density function (pdf) is within a factor of 2 of the pdf of

. For distributions over

, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for

extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

Paper Structure (42 sections, 60 theorems, 202 equations, 1 figure, 5 algorithms)

This paper contains 42 sections, 60 theorems, 202 equations, 1 figure, 5 algorithms.

Introduction
Our Results
Techniques
Distributions over $\mathbb{R}$:
Distributions on HSTs
Preliminaries
Wasserstein Distance:
Differential Privacy
On Instance Optimality
Local Estimation Rates
Locally Minimal Algorithms
Relaxed Definitions
Additional Related Work
Instance Optimality for Differentially Private Statistics:
Other Beyond Worse-Case Results in Central Differential Privacy:
...and 27 more sections

Key Result

Theorem 1.1

Let $\varepsilon, \gamma \in (0,1]$. There is an $\varepsilon$-differentially private algorithm $\mathcal{A}\xspace$ such that, for all distributions $P$ supported in $[0,1]$, for all natural numbers $n > \frac{\operatorname{polylog} 1/\gamma}{\varepsilon}$, there exists a distribution $Q$ (with $D_ where $n' \approx \frac{n}{\operatorname{polylog} n/\gamma}$

Figures (1)

Figure 1: (Left) A sparsely supported distribution on integers [0,999] (pdf). (Right) CDF for the same distribution (green, solid line), along with a (non-private) minimax optimal learnt distribution (blue, dashed line), as well as 1-DP instance-optimal algorithm (red, dotted), both learnt from the same 1600 samples. The $W_1$ error for the minimax optimal algorithm is 13.4, whereas the DP estimated distribution has $W_1$ error of $0.86$. While this example is artificial, it demonstrates the large potential gap between minimax optimal and instance optimal algorithms on specific instances.

Theorems & Definitions (109)

Theorem 1.1: Informal 1-dimensional result
Theorem 1.2: Informal two-dimensional result
Theorem 1.3: Informal finite metric result
Definition 1.4: Hierarchically Separated Tree
Lemma 1.5: Closed form Wasserstein distance formula
Definition 2.1: $D_{\infty}$-divergence
Definition 2.2
Lemma 2.3: Wasserstein formula over $\mathbb{R}$
Definition 2.4: Hamming Distance
Definition 2.5: Differential Privacy DworkMNS06jDworkKMMN06
...and 99 more

Instance-Optimal Private Density Estimation in the Wasserstein Distance

TL;DR

Abstract

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (109)