Table of Contents
Fetching ...

On the Statistical Complexity of Estimation and Testing under Privacy Constraints

Clément Lalanne, Aurélien Garivier, Rémi Gribonval

TL;DR

This work develops a unified, transport-based framework to derive minimax lower bounds for estimators under differential privacy, extending Le Cam and Fano-style inequalities to $(\varepsilon,\delta)$-DP and $\rho$-zCDP via a Kantorovich transport formulation. It demonstrates a plug-and-play approach using admissible similarity functions and couplings to convert privacy-constrained estimation problems into tractable optimization over couplings, with explicit rates shown for Bernoulli, Gaussian mean, and uniform-support models. The paper also shows that private stochastic gradient Langevin dynamics (DP-SGLD) achieves near-minimax optimal performance for private maximum likelihood in exponential-family settings, with performance governed by the regularity constants of the log-likelihood rather than merely by $n$ or the privacy budget. Overall, the results provide a general methodology and concrete rates for understanding the privacy-utility tradeoff and a practical private MLE method applicable to a broad class of parametric models. The work has implications for deploying privacy-preserving inference in large-scale data applications, offering a rigorous basis for choosing privacy parameters and estimators.

Abstract

The challenge of producing accurate statistics while respecting the privacy of the individuals in a sample is an important area of research. We study minimax lower bounds for classes of differentially private estimators. In particular, we show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion by solving an appropriate transport problem. With specific coupling constructions, this observation allows us to derive Le Cam-type and Fano-type inequalities not only for regular definitions of differential privacy but also for those based on Renyi divergence. We then proceed to illustrate our results on three simple, fully worked out examples. In particular, we show that the problem class has a huge importance on the provable degradation of utility due to privacy. In certain scenarios, we show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high. Conversely, for other problems, even a modest level of privacy protection can lead to a significant decrease in performance. Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence, as it provides near-optimal results with respect to both the size of the sample and the level of privacy protection. This algorithm is applicable to a broad range of parametric estimation procedures, including exponential families.

On the Statistical Complexity of Estimation and Testing under Privacy Constraints

TL;DR

This work develops a unified, transport-based framework to derive minimax lower bounds for estimators under differential privacy, extending Le Cam and Fano-style inequalities to -DP and -zCDP via a Kantorovich transport formulation. It demonstrates a plug-and-play approach using admissible similarity functions and couplings to convert privacy-constrained estimation problems into tractable optimization over couplings, with explicit rates shown for Bernoulli, Gaussian mean, and uniform-support models. The paper also shows that private stochastic gradient Langevin dynamics (DP-SGLD) achieves near-minimax optimal performance for private maximum likelihood in exponential-family settings, with performance governed by the regularity constants of the log-likelihood rather than merely by or the privacy budget. Overall, the results provide a general methodology and concrete rates for understanding the privacy-utility tradeoff and a practical private MLE method applicable to a broad class of parametric models. The work has implications for deploying privacy-preserving inference in large-scale data applications, offering a rigorous basis for choosing privacy parameters and estimators.

Abstract

The challenge of producing accurate statistics while respecting the privacy of the individuals in a sample is an important area of research. We study minimax lower bounds for classes of differentially private estimators. In particular, we show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion by solving an appropriate transport problem. With specific coupling constructions, this observation allows us to derive Le Cam-type and Fano-type inequalities not only for regular definitions of differential privacy but also for those based on Renyi divergence. We then proceed to illustrate our results on three simple, fully worked out examples. In particular, we show that the problem class has a huge importance on the provable degradation of utility due to privacy. In certain scenarios, we show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high. Conversely, for other problems, even a modest level of privacy protection can lead to a significant decrease in performance. Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence, as it provides near-optimal results with respect to both the size of the sample and the level of privacy protection. This algorithm is applicable to a broad range of parametric estimation procedures, including exponential families.
Paper Structure (54 sections, 18 theorems, 169 equations, 1 algorithm)

This paper contains 54 sections, 18 theorems, 169 equations, 1 algorithm.

Key Result

Theorem 1

[theorem]th:lecamdp If a randomized mechanism $\mathfrak{M}$ satisfies $(\epsilon, \delta)$-DP, then for any test function $\Psi : \operatorname{codom}\left({\mathfrak{M}}\right) \rightarrow \{1, 2\}$ and any probability distributions $\mathbb{P}_1$ and $\mathbb{P}_2$ on $\mathcal{X}^n$ we have Furthermore, when $\mathbb{P}_1 = \mathbb{p}_1^{\otimes n}$ and $\mathbb{P}_2 = \mathbb{p}_2^{\otimes n

Theorems & Definitions (32)

  • Theorem 1: Le Cam for $(\epsilon, \delta)$-DP
  • Theorem 2: Le Cam for $\rho$-zCDP
  • Theorem 3: Multiple Distributional Tests for $(\epsilon, \delta)$-DP
  • Theorem 4: Multiple Distributional Tests for $\rho$-zCDP
  • Definition 1
  • Theorem 5
  • proof
  • Theorem 6: Admissible similarity functions for $(\epsilon, \delta)$-DP
  • Theorem 7: Admissible similarity functions for $\rho$-zCDP
  • Example 1: Bernoulli optimal coupling
  • ...and 22 more