Table of Contents
Fetching ...

Differentially Private Selection using Smooth Sensitivity

Iago Chaves, Victor Farias, Amanda Perez, Diego Mesquita, Javam Machado

TL;DR

The paper tackles private selection under differential privacy for discrete outputs by introducing Smooth Noisy Max (SNM), a mechanism that adds noise scaled to a smooth upper bound on local sensitivity to achieve tighter utility than global-sensitivity baselines. It provides DP guarantees (via $(\varepsilon,\delta)$-DP) for SNM using admissible noise distributions and derives meaningful utility bounds, including a tail bound $\Pr[\xi(\mathscr{A},\mathbf{x})\ge t]\le |\mathscr{R}|\exp(-\varepsilon t/(4\mathscr{S}_{u,\beta}(\mathbf{x})))$. The authors apply SNM to three downstream tasks—percentile selection, greedy decision trees, and random forests—demonstrating improved accuracy and reduced error across multiple datasets, relative to EM, PF, and LD variants. This work extends smooth sensitivity to discrete private selection, offering practical DP-enabled tools that reduce noise without sacrificing privacy, and highlights directions for future improvement such as element local sensitivity and multi-objective private selection.

Abstract

Differentially private selection mechanisms offer strong privacy guarantees for queries aiming to identify the top-scoring element r from a finite set R, based on a dataset-dependent utility function. While selection queries are fundamental in data science, few mechanisms effectively ensure their privacy. Furthermore, most approaches rely on global sensitivity to achieve differential privacy (DP), which can introduce excessive noise and impair downstream inferences. To address this limitation, we propose the Smooth Noisy Max (SNM) mechanism, which leverages smooth sensitivity to yield provably tighter (upper bounds on) expected errors compared to global sensitivity-based methods. Empirical results demonstrate that SNM is more accurate than state-of-the-art differentially private selection methods in three applications: percentile selection, greedy decision trees, and random forests.

Differentially Private Selection using Smooth Sensitivity

TL;DR

The paper tackles private selection under differential privacy for discrete outputs by introducing Smooth Noisy Max (SNM), a mechanism that adds noise scaled to a smooth upper bound on local sensitivity to achieve tighter utility than global-sensitivity baselines. It provides DP guarantees (via -DP) for SNM using admissible noise distributions and derives meaningful utility bounds, including a tail bound . The authors apply SNM to three downstream tasks—percentile selection, greedy decision trees, and random forests—demonstrating improved accuracy and reduced error across multiple datasets, relative to EM, PF, and LD variants. This work extends smooth sensitivity to discrete private selection, offering practical DP-enabled tools that reduce noise without sacrificing privacy, and highlights directions for future improvement such as element local sensitivity and multi-objective private selection.

Abstract

Differentially private selection mechanisms offer strong privacy guarantees for queries aiming to identify the top-scoring element r from a finite set R, based on a dataset-dependent utility function. While selection queries are fundamental in data science, few mechanisms effectively ensure their privacy. Furthermore, most approaches rely on global sensitivity to achieve differential privacy (DP), which can introduce excessive noise and impair downstream inferences. To address this limitation, we propose the Smooth Noisy Max (SNM) mechanism, which leverages smooth sensitivity to yield provably tighter (upper bounds on) expected errors compared to global sensitivity-based methods. Empirical results demonstrate that SNM is more accurate than state-of-the-art differentially private selection methods in three applications: percentile selection, greedy decision trees, and random forests.

Paper Structure

This paper contains 26 sections, 17 theorems, 39 equations, 5 figures, 4 algorithms.

Key Result

Lemma 2.3

Note that a mechanism $\mathscr{A}$ is $(\varepsilon, \delta)$-differentially private if and only if on every two neighboring databases $\mathbf{x},\mathbf{y}: D_{\infty}^\delta (\mathscr{A}(\mathbf{x})||\mathscr{A}(\mathbf{y})) \leq \varepsilon$ and $D_{\infty}^\delta (\mathscr{A}(\mathbf{y})||\mat

Figures (5)

  • Figure 1: The utility function $u_p$ maps elements of $\mathbf{x}$ to a utility value. In this example, the $x_k = 5$, and $\boldsymbol{u}_p^{\mathbf{x}}$ is the utility vector for the dataset $\mathbf{x}$. The subsets $\boldsymbol{j}_l$ and $\boldsymbol{j}_g$ partition the dataset into elements less than and greater than index $k$, respectively. The final equation computes $j$ as the maximum of the summed utility values, i.e., the number of elements that have the same value of $x_k$ in each partition.
  • Figure 2: Comparison of private selection methods for percentile selection. Plots show the absolute expected error (AEE) as a function of the privacy budget $\varepsilon \in [10^{-1}, 10^2]$. The x-axis uses log scale. Overall, SNM-LAP and SNM-T achieve lower expected errors than other methods for all $\varepsilon$.
  • Figure 3: Local Dampening (LD2) probabilities on the HEPTH dataset with $p=50$. The first plot demonstrates that the probability of selecting element $41$ (median) is low when the privacy budget is minimal. The second graph depicts a scenario with very low expected error, suggesting that the observed low expected error occurs by chance. The last plot illustrates that with an increased privacy budget, LD2 converges effectively.
  • Figure 4: Comparison of private selection methods for the greedy decision tree application. The plots show the mean accuracy of greedy decision tree experiments - 5 runs of 10-fold cross-validation, where $d \in \{2, 5\}$ and $\varepsilon \in \{0.01, 0.05, 0.1, 0.5, 1, 2\}$. X axis is in log scale. All SNM variants consistently achieve superior accuracy compared to competing methods. Notably, the performance of SNM-T is especially significant, as it ensures $\varepsilon$-dp.
  • Figure 5: Comparison of private selection methods for the random forest problem. The plots show mean accuracy for WDP, EM, PF, LD, and SNM variants of random forest with 32 random trees varying $\varepsilon \in \{0.01, 0.05, 0.1, 1, 2\}$. X is in log scale. The SNM flavors constantly reach the standard non-private random forest accuracy level. When compared with other private selection methods, the variants of SNM surpass in almost all $\varepsilon$ values.

Theorems & Definitions (36)

  • Definition 2.1: $(\varepsilon, \delta)$-Differential privacy dwork2014algorithmic
  • Definition 2.2: $\delta$-Approximate Max Divergence dwork2014algorithmic
  • Lemma 2.3: Approx. Differential Privacy dwork2014algorithmic
  • Definition 2.4: Global sensitivity dwork2014algorithmic
  • Definition 2.5: Local sensitivity at distance $t$ nissim2007smooth
  • Definition 2.6: Smooth bound nissim2007smooth
  • Definition 2.7: Smooth sensitivity nissim2007smooth
  • Corollary 2.8: Smooth sensitivity upper bound
  • Definition 2.9: Admissible Noise Distribution nissim2007smooth
  • Definition 3.1: Exponential Mechanism mcsherry2007mechanism
  • ...and 26 more