Table of Contents
Fetching ...

Rate Optimality and Phase Transition for User-Level Local Differential Privacy

Alexander Kent, Thomas B. Berrett, Yi Yu

TL;DR

The paper analyzes user-level local differential privacy (LDP) where each user holds $T$ observations and seeks private, accurate estimation of functionals of the distribution. It develops a general infinite-$T$ minimax framework (covering numbers and Fano-type bounds) and derives near-matching upper bounds via unary voting (CoverSelect) while precisely characterising phase transitions in estimation rates as $T$ grows. Across mean estimation (both $\ell_2$ and $\ell_\infty$ balls), sparse mean estimation, and non-parametric density estimation, the work reveals a regime where the minimax rate matches the item-level rate with $nT$ observations, and a phase where the rate becomes $e^{-c n\min\{\alpha,\alpha^2\}/d}$ (or $/s$ for sparse problems) and is independent of $T$. Remarkably, for $s$-sparse high-dimensional means, user-level LDP can enable consistent estimation even when the ambient dimension $d$ is large, provided $\log d = O(T)$, highlighting practical advantages of user-level privacy. The paper also demonstrates phase transitions experimentally and applies methods to real data, indicating meaningful improvements over item-level privacy in several regimes and offering a foundation for further explorations in privacy-preserving decentralised learning with heterogeneous data.

Abstract

Most of the literature on differential privacy considers the item-level case where each user has a single observation, but a growing field of interest is that of user-level privacy where each of the $n$ users holds $T$ observations and wishes to maintain the privacy of their entire collection. In this paper, we derive a general minimax lower bound, which shows that, for locally private user-level estimation problems, the risk cannot, in general, be made to vanish for a fixed number of users even when each user holds an arbitrarily large number of observations. We then derive matching, up to logarithmic factors, lower and upper bounds for univariate and multidimensional mean estimation, sparse mean estimation and non-parametric density estimation. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as $T$ the number of observations each user holds varies. In the case of (non-sparse) mean estimation and density estimation, we see that, for $T$ below a phase transition boundary, the rate is the same as having $nT$ users in the item-level setting. Different behaviour is however observed in the case of $s$-sparse $d$-dimensional mean estimation, wherein consistent estimation is impossible when $d$ exceeds the number of observations in the item-level setting, but is possible in the user-level setting when $T \gtrsim s \log (d)$, up to logarithmic factors. This may be of independent interest for applications as an example of a high-dimensional problem that is feasible under local privacy constraints.

Rate Optimality and Phase Transition for User-Level Local Differential Privacy

TL;DR

The paper analyzes user-level local differential privacy (LDP) where each user holds observations and seeks private, accurate estimation of functionals of the distribution. It develops a general infinite- minimax framework (covering numbers and Fano-type bounds) and derives near-matching upper bounds via unary voting (CoverSelect) while precisely characterising phase transitions in estimation rates as grows. Across mean estimation (both and balls), sparse mean estimation, and non-parametric density estimation, the work reveals a regime where the minimax rate matches the item-level rate with observations, and a phase where the rate becomes (or for sparse problems) and is independent of . Remarkably, for -sparse high-dimensional means, user-level LDP can enable consistent estimation even when the ambient dimension is large, provided , highlighting practical advantages of user-level privacy. The paper also demonstrates phase transitions experimentally and applies methods to real data, indicating meaningful improvements over item-level privacy in several regimes and offering a foundation for further explorations in privacy-preserving decentralised learning with heterogeneous data.

Abstract

Most of the literature on differential privacy considers the item-level case where each user has a single observation, but a growing field of interest is that of user-level privacy where each of the users holds observations and wishes to maintain the privacy of their entire collection. In this paper, we derive a general minimax lower bound, which shows that, for locally private user-level estimation problems, the risk cannot, in general, be made to vanish for a fixed number of users even when each user holds an arbitrarily large number of observations. We then derive matching, up to logarithmic factors, lower and upper bounds for univariate and multidimensional mean estimation, sparse mean estimation and non-parametric density estimation. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as the number of observations each user holds varies. In the case of (non-sparse) mean estimation and density estimation, we see that, for below a phase transition boundary, the rate is the same as having users in the item-level setting. Different behaviour is however observed in the case of -sparse -dimensional mean estimation, wherein consistent estimation is impossible when exceeds the number of observations in the item-level setting, but is possible in the user-level setting when , up to logarithmic factors. This may be of independent interest for applications as an example of a high-dimensional problem that is feasible under local privacy constraints.
Paper Structure (59 sections, 30 theorems, 379 equations, 12 figures, 6 tables, 13 algorithms)

This paper contains 59 sections, 30 theorems, 379 equations, 12 figures, 6 tables, 13 algorithms.

Key Result

Theorem 1

Given a family of distributions $\mathcal{P}$, let $N(\Delta)$ be the $\Delta$-covering number of the metric space $(\Theta, \rho)$ with $\Theta = \theta(\mathcal{P})$. The LDP minimax risk satisfies where $\mathrm{diam}(\Theta) = \sup_{\theta, \theta' \in \Theta} \rho(\theta, \theta')$.

Figures (12)

  • Figure 1: Comparison of performance of user-level LDP and $nT$-item level LDP estimator performance. A point in a region of the graph indicates the regime for a value of the pair $(T, n\alpha^2/d)$. Constants, constants in exponents, and logarithmic factors are omitted for ease of comparison.
  • Figure 2: Ratio of mean-squared-error between full item level and other estimation schemes in univariate setting (left), and mean-squared-error of estimator in multivariate setting with $d \in \{8, 16, 32\}$ (centre-left, centre-right, right). Error bars denote two standard errors across 500 repetitions.
  • Figure 3: MSE for the sparse mean-estimation procedures. Ribbons indicate one standard error across 500 repetitions.
  • Figure 4: Mean-squared-error of estimator of proportion of traffic stops in the TX-Statewide dataset leading to a search (left), and estimator of proportion of races of subjects of stops (right), with privacy parameters $\alpha \in \{0.5, 1, 2, 4\}$. Mean-squared-error in univariate setting is scaled by $\alpha^2$ to aid comparison across privacy levels. Ribbons and error bars denote two standard errors across 1,000 repetitions.
  • Figure 5: Minimum mean-squared-error across specified sub-interval widths $\Delta$ against $T$, the number of observations per user, across varying privacy levels, with $n = 100$ (left) and $n = 200$ (right).
  • ...and 7 more figures

Theorems & Definitions (68)

  • Remark 1
  • Example 1
  • Theorem 1: General infinite-$T$ rates
  • Corollary 2
  • Remark 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 6
  • Remark 3
  • ...and 58 more