Table of Contents
Fetching ...

Mean estimation in the add-remove model of differential privacy

Alex Kulesza, Ananda Theertha Suresh, Yuyan Wang

TL;DR

The paper addresses the problem of estimating the mean of a bounded real-valued dataset under the add-remove model of differential privacy, and shows that the add-remove and swap models yield essentially the same minimum achievable mean-squared error up to o(1). It introduces two technical innovations—a transformed noise approach that optimizes high-privacy performance and an hourglass mechanism that provides optimal low-privacy behavior—establishing an exact, min-max optimal algorithm for all privacy budgets. The hourglass mechanism achieves the univariate staircase marginals in two dimensions, enabling efficient sampling and rigorously matching the swap-model constant via an information-theoretic lower bound. Overall, the work closes the gap between the add-remove and swap models for mean estimation, with practical performance gains demonstrated by experiments across privacy regimes.

Abstract

Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estimation under the add-remove model. We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all $ε$, and that this constant is the same as the optimal algorithm under the swap model. These results show that the add-remove and swap models give nearly identical errors for mean estimation, even though the add-remove model cannot treat the size of the dataset as public information. We also demonstrate empirically that our proposed algorithm yields at least a factor of two improvement in mean squared error over algorithms frequently used in practice. One of our main technical contributions is a new hour-glass mechanism, which might be of independent interest in other scenarios.

Mean estimation in the add-remove model of differential privacy

TL;DR

The paper addresses the problem of estimating the mean of a bounded real-valued dataset under the add-remove model of differential privacy, and shows that the add-remove and swap models yield essentially the same minimum achievable mean-squared error up to o(1). It introduces two technical innovations—a transformed noise approach that optimizes high-privacy performance and an hourglass mechanism that provides optimal low-privacy behavior—establishing an exact, min-max optimal algorithm for all privacy budgets. The hourglass mechanism achieves the univariate staircase marginals in two dimensions, enabling efficient sampling and rigorously matching the swap-model constant via an information-theoretic lower bound. Overall, the work closes the gap between the add-remove and swap models for mean estimation, with practical performance gains demonstrated by experiments across privacy regimes.

Abstract

Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estimation under the add-remove model. We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all , and that this constant is the same as the optimal algorithm under the swap model. These results show that the add-remove and swap models give nearly identical errors for mean estimation, even though the add-remove model cannot treat the size of the dataset as public information. We also demonstrate empirically that our proposed algorithm yields at least a factor of two improvement in mean squared error over algorithms frequently used in practice. One of our main technical contributions is a new hour-glass mechanism, which might be of independent interest in other scenarios.
Paper Structure (25 sections, 16 theorems, 95 equations, 5 figures, 3 algorithms)

This paper contains 25 sections, 16 theorems, 95 equations, 5 figures, 3 algorithms.

Key Result

Lemma 1

Let $b \geq 0$. Let $a$ be such that $|a|/b \leq M$. Let $C = \frac{Z_a}{b} - \frac{aZ_b}{b^2}$ and $F = \left \lvert \frac{2MZ^2_b}{b^2}\right \rvert + \left \lvert \frac{2Z_a Z_b}{b^2} \right \rvert$. Then,

Figures (5)

  • Figure 1: The Laplace mechanism applied to different linear transformations of $S(q)$. The middle plot shows the original sensitivity space, where red denotes the noise ball used by Algorithm \ref{['alg:very_old']}; orange the ball used by Algorithm \ref{['alg:current']}, and blue the ball used by Algorithm \ref{['alg:new']}, which is the smallest convex shape possible. The left plot shows the transformed space used by Algorithm \ref{['alg:new']}, and the right plot shows the transformed space used by Algorithm \ref{['alg:current']}.
  • Figure 2: Error comparison of different algorithms with varying $\varepsilon$ or $\mu$.
  • Figure 3: hourglass and two dimensional staircase mechanisms, low privacy regime, varying $\mu$.
  • Figure :
  • Figure :

Theorems & Definitions (27)

  • Definition 1: Differential privacy dwork2014algorithmic
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof
  • Corollary 1
  • Theorem 2
  • proof
  • Lemma 3: Marginal distribution
  • proof
  • ...and 17 more