Table of Contents
Fetching ...

The Correlated Gaussian Sparse Histogram Mechanism

Christian Janos Lebeda, Lukas Retschmeier

TL;DR

This work addresses releasing high-dimensional sparse histograms under $(\varepsilon,\delta)$-DP by extending the Gaussian Sparse Histogram Mechanism with correlated noise, yielding the Correlated Stability Histogram (CSH). By exploiting $k$-sparsity and monotonicity, CSH reduces the total noise and lowers the threshold, achieving up to a $2\times$ improvement in utility while preserving privacy. The authors provide both an add-the-deltas analysis and a tighter, case-based bound, and extend the framework to top-$k$ queries and discrete Gaussian noise. They validate the approach with experiments showing substantial utility gains and discuss practical extensions to sparsity thresholds and aggregators. The work thus advances private release of sparse, high-dimensional histograms with practical impact for large-scale data analytics.

Abstract

We consider the problem of releasing a sparse histogram under $(\varepsilon, δ)$-differential privacy. The stability histogram independently adds noise from a Laplace or Gaussian distribution to the non-zero entries and removes those noisy counts below a threshold. Thereby, the introduction of new non-zero values between neighboring histograms is only revealed with probability at most $δ$, and typically, the value of the threshold dominates the error of the mechanism. We consider the variant of the stability histogram with Gaussian noise. Recent works ([Joseph and Yu, COLT '24] and [Lebeda, SOSA '25]) reduced the error for private histograms using correlated Gaussian noise. However, these techniques can not be directly applied in the very sparse setting. Instead, we adopt Lebeda's technique and show that adding correlated noise to the non-zero counts only allows us to reduce the magnitude of noise when we have a sparsity bound. This, in turn, allows us to use a lower threshold by up to a factor of $1/2$ compared to the non-correlated noise mechanism. We then extend our mechanism to a setting without a known bound on sparsity. Additionally, we show that correlated noise can give a similar improvement for the more practical discrete Gaussian mechanism.

The Correlated Gaussian Sparse Histogram Mechanism

TL;DR

This work addresses releasing high-dimensional sparse histograms under -DP by extending the Gaussian Sparse Histogram Mechanism with correlated noise, yielding the Correlated Stability Histogram (CSH). By exploiting -sparsity and monotonicity, CSH reduces the total noise and lowers the threshold, achieving up to a improvement in utility while preserving privacy. The authors provide both an add-the-deltas analysis and a tighter, case-based bound, and extend the framework to top- queries and discrete Gaussian noise. They validate the approach with experiments showing substantial utility gains and discuss practical extensions to sparsity thresholds and aggregators. The work thus advances private release of sparse, high-dimensional histograms with practical impact for large-scale data analytics.

Abstract

We consider the problem of releasing a sparse histogram under -differential privacy. The stability histogram independently adds noise from a Laplace or Gaussian distribution to the non-zero entries and removes those noisy counts below a threshold. Thereby, the introduction of new non-zero values between neighboring histograms is only revealed with probability at most , and typically, the value of the threshold dominates the error of the mechanism. We consider the variant of the stability histogram with Gaussian noise. Recent works ([Joseph and Yu, COLT '24] and [Lebeda, SOSA '25]) reduced the error for private histograms using correlated Gaussian noise. However, these techniques can not be directly applied in the very sparse setting. Instead, we adopt Lebeda's technique and show that adding correlated noise to the non-zero counts only allows us to reduce the magnitude of noise when we have a sparsity bound. This, in turn, allows us to use a lower threshold by up to a factor of compared to the non-correlated noise mechanism. We then extend our mechanism to a setting without a known bound on sparsity. Additionally, we show that correlated noise can give a similar improvement for the more practical discrete Gaussian mechanism.

Paper Structure

This paper contains 20 sections, 18 theorems, 14 equations, 3 figures, 1 table, 4 algorithms.

Key Result

Theorem 1.1

Let $H(\mathbf{X}) = \sum_i^n X_i$ denote a histogram with bounded sparsity, where $\mathbf{X} = (X_1, ... X_n)$ and $X_i \in \{0, 1\}^d$. If the GSHM privately releases $H(\mathbf{X})$ under $(\varepsilon, \delta)$-DP with noise magnitude $\sigma$ and removes noisy counters below a threshold $\tau$

Figures (3)

  • Figure 1: Examples of different kinds of neighboring datasets for the Gaussian Sparse Histogram Mechanism where a single user can contribute to at most four counters, thus $\|X_i\|_0 \leq 4$. These counters are depicted in green. a) For the example on the left, the mechanism behaves exactly as running the Gaussian mechanism on a restricted domain. b) In the case in the middle, we only have to bound the probability that one of the green elements together with the additive noise term exceeds the threshold $1 + \tau$. c) The case on the right is the most difficult case for the privacy analysis because the overall $\delta$ value depends on both kinds of changes.
  • Figure 2: The separation technique of the $\delta_{\text{\footnotesize{gauss}}}$ and $\delta_{\text{\footnotesize{inf}}}$ used in \ref{['thm:algorithm-privacy-add-the-deltas', 'th:max-the-delta']}. The idea is to construct an intermediate histogram $H(\hat{\mathbf{X}})$ with the same support $U'$ as $H(\mathbf{X}')$ but only reflect the changes that can cause infinite privacy loss between $H(\mathbf{X})$ and $H(\mathbf{X}')$.
  • Figure 3: The results of our experiments. Using the same parameters as in Wilkins24-GaussianSparseHistogramMechanism, the graphs show the minimum $\tau$ required to get $(0.35, 10^{-5})$-DP guarantees for a noise level $\sigma$. The sbGreengreen line denotes the tight analysis of Wilkins24-GaussianSparseHistogramMechanism, the sbRedred shows the add-the-deltasgooglelibthreshold approach and the sbBlueblue and sbOrangeorange lines are our results. The marked points denote the minimum $\tau$ for each technique. a) Uses the same parameters as in Wilkins. As high values of $k$ are preferable for our mechanism, we bring down the threshold from $\approx13950$ to $\approx 7860$, lowering it by $\approx43\%$: b) We get some small improvement even for small $k$ values. Note that since our mechanism adds two noise samples the plot shows the total magnitude of noise.

Theorems & Definitions (25)

  • Theorem 1.1: The Correlated Stability Histogram (Informal)
  • Definition 2.1: Neighboring datasets
  • Definition 2.2: dworkRothBook $(\varepsilon, \delta)$-differential privacy
  • Definition 2.3: Sensitivity space and $\ell_2$ sensitivity
  • Definition 2.4: Gaussian Distribution
  • Lemma 2.5: Balle18-AnalyticalGaussian The Gaussian Mechanism
  • Lemma 2.6: googlelibthreshold add-the-deltas
  • Lemma 2.7: Wilkins24-GaussianSparseHistogramMechanism Exact Privacy Analysis of the GSHM
  • Lemma 2.8: lebeda2024 The Correlated Gaussian Mechanism
  • Definition 3.1: $k$-sparse monotonic histogram
  • ...and 15 more