Table of Contents
Fetching ...

Tukey Depth Mechanisms for Practical Private Mean Estimation

Gavin Brown, Lydia Zakynthinou

TL;DR

This work addresses private mean estimation for multivariate Gaussian data by bringing theoretically optimal Tukey-depth-based methods into practice. It develops practical implementations of the Restricted Tukey Depth Mechanism (REM) and BoxEM, including exact and approximate Tukey depths and random-direction variants, to achieve robust, affine-invariant accuracy in small-sample, low-dimensional regimes. The authors provide detailed algorithms for sampling from the exponential mechanism, PTR-based privacy checks, and both exact and approximate volume computations, along with empirical results showing superior performance to Gaussian and CoinPress baselines under privacy constraints. They also outline a roadmap for extending to higher dimensions via PAC-volume estimation and approximate sampling, highlighting the trade-offs between computation and accuracy and the need for provably practical volume-estimation techniques. Overall, the paper presents a practical, robust toolkit for private multivariate mean estimation and charts a path toward scalable private analytics in moderate dimensions.

Abstract

Mean estimation is a fundamental task in statistics and a focus within differentially private statistical estimation. While univariate methods based on the Gaussian mechanism are widely used in practice, more advanced techniques such as the exponential mechanism over quantiles offer robustness and improved performance, especially for small sample sizes. Tukey depth mechanisms carry these advantages to multivariate data, providing similar strong theoretical guarantees. However, practical implementations fall behind these theoretical developments. In this work, we take the first step to bridge this gap by implementing the (Restricted) Tukey Depth Mechanism, a theoretically optimal mean estimator for multivariate Gaussian distributions, yielding improved practical methods for private mean estimation. Our implementations enable the use of these mechanisms for small sample sizes or low-dimensional data. Additionally, we implement variants of these mechanisms that use approximate versions of Tukey depth, trading off accuracy for faster computation. We demonstrate their efficiency in practice, showing that they are viable options for modest dimensions. Given their strong accuracy and robustness guarantees, we contend that they are competitive approaches for mean estimation in this regime. We explore future directions for improving the computational efficiency of these algorithms by leveraging fast polytope volume approximation techniques, paving the way for more accurate private mean estimation in higher dimensions.

Tukey Depth Mechanisms for Practical Private Mean Estimation

TL;DR

This work addresses private mean estimation for multivariate Gaussian data by bringing theoretically optimal Tukey-depth-based methods into practice. It develops practical implementations of the Restricted Tukey Depth Mechanism (REM) and BoxEM, including exact and approximate Tukey depths and random-direction variants, to achieve robust, affine-invariant accuracy in small-sample, low-dimensional regimes. The authors provide detailed algorithms for sampling from the exponential mechanism, PTR-based privacy checks, and both exact and approximate volume computations, along with empirical results showing superior performance to Gaussian and CoinPress baselines under privacy constraints. They also outline a roadmap for extending to higher dimensions via PAC-volume estimation and approximate sampling, highlighting the trade-offs between computation and accuracy and the need for provably practical volume-estimation techniques. Overall, the paper presents a practical, robust toolkit for private multivariate mean estimation and charts a path toward scalable private analytics in moderate dimensions.

Abstract

Mean estimation is a fundamental task in statistics and a focus within differentially private statistical estimation. While univariate methods based on the Gaussian mechanism are widely used in practice, more advanced techniques such as the exponential mechanism over quantiles offer robustness and improved performance, especially for small sample sizes. Tukey depth mechanisms carry these advantages to multivariate data, providing similar strong theoretical guarantees. However, practical implementations fall behind these theoretical developments. In this work, we take the first step to bridge this gap by implementing the (Restricted) Tukey Depth Mechanism, a theoretically optimal mean estimator for multivariate Gaussian distributions, yielding improved practical methods for private mean estimation. Our implementations enable the use of these mechanisms for small sample sizes or low-dimensional data. Additionally, we implement variants of these mechanisms that use approximate versions of Tukey depth, trading off accuracy for faster computation. We demonstrate their efficiency in practice, showing that they are viable options for modest dimensions. Given their strong accuracy and robustness guarantees, we contend that they are competitive approaches for mean estimation in this regime. We explore future directions for improving the computational efficiency of these algorithms by leveraging fast polytope volume approximation techniques, paving the way for more accurate private mean estimation in higher dimensions.

Paper Structure

This paper contains 30 sections, 6 theorems, 45 equations, 13 figures, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

For any $\varepsilon,\delta >0$, the Restricted Tukey Depth Mechanism is $(\varepsilon, \delta)$-differentially private. There exists an absolute constant $C$ such that, for any $0<\alpha,\beta,\varepsilon < 1$, $0<\delta\le \frac{1}{2}$, mean $\mu$, and positive definite $\Sigma$, if $x\sim \mathca then with probability at least $1-\beta$, $\|\mathcal{A}(x)-\mu\|_{\Sigma}\leq\alpha$.

Figures (13)

  • Figure 1: Tukey depth is a multivariate notion of centrality. The convex hull of a dataset $x$ is exactly the set of points $y$ with $T_x(y)>0$, i.e., nonzero depth. The light gray region is the set of points of depth at least 3; inside that is the set of points of depth at least 4 (gray and hatched). Note that Tukey depth is defined for any point in $\mathbb{R}^d$, not just elements of $x$.
  • Figure 2: When data arise from a nonspherical distribution (a), the Tukey level sets (b) reflect this. In (c), we show the covariance of output distributions of different mechanisms: the empirical mean and the output of BoxEM-Exact have similar shapes. CoinPress reveals its use of spherical Gaussian noise. (Here we use $n=200$; CoinPress has relatively high error. To emphasize shape, we scaled the CoinPress covariance down by a factor of ten.) Interestingly, BoxEM with axis-aligned depth seems to sit between the two methods.
  • Figure 3: Mechanisms' $\ell_2$ error as a function of sample size. Empirical represents error due to sampling. Other lines quantify the "cost of privacy," i.e., the difference between the empirical mean and the private estimate. The Tukey mechanism introduces error comparable to the empirical error at small samples sizes.
  • Figure 4: Mechanisms' $\ell_2$ error as a function of $R$, the range-bounding hyperparameter. Tukey depth mechanisms (BoxEM, REM) exhibit essentially no dependence on $R$. Note the log-log scale. This experiment uses $n=1000$, $d=2$. Random depth used $k=30$ directions.
  • Figure 5: BoxEM's $\ell_2$ error under different notions of depth. As more directions $k$ are used, the random depth mechanism soon performs better than axis-aligned depth, and approaches error close to that of exact depth. This experiment uses $n=200$, $d=2$, and 200 trials. Empirical, Exact, and Axis-Aligned lines each represent a single quantity (which does not depend on $k$). Note the $\log$-scale in $k$.
  • ...and 8 more figures

Theorems & Definitions (14)

  • Definition 2.1: Indistinguishability
  • Definition 2.2: Adjacency
  • Definition 2.3: Differential Privacy
  • Definition 2.4: Strong Contamination
  • Theorem 3.1: Theorem 3.2 brown2021covariance
  • Definition 6.1
  • Definition A.1
  • Definition A.2
  • Lemma A.3
  • Lemma A.4
  • ...and 4 more