Table of Contents
Fetching ...

Bayesian Frequency Estimation Under Local Differential Privacy With an Adaptive Randomized Response Mechanism

Soner Aydin, Sinan Yildirim

TL;DR

The paper tackles online frequency estimation for a discrete K-category distribution under local differential privacy by introducing AdOBEst-LDP, which adaptively tunes a randomized response mechanism to maximize information gain over time. The core idea is to constrain privatization to a high-utility subset S via the randomly restricted randomized response (RRRR), with S chosen at each step using posterior samples and several utility metrics. Estimation is performed in a Bayesian framework using scalable posterior sampling (SGLD), and the authors establish posterior consistency and, under exact sampling, asymptotic optimality of subset selection, supplemented by numerical results showing improved accuracy over non-adaptive and semi-adaptive baselines across privacy levels. The work advances practical online DP frequency estimation by combining adaptive mechanism design with scalable Bayesian inference, offering theoretical guarantees and robust empirical performance.

Abstract

Frequency estimation plays a critical role in many applications involving personal and private categorical data. Such data are often collected sequentially over time, making it valuable to estimate their distribution online while preserving privacy. We propose AdOBEst-LDP, a new algorithm for adaptive, online Bayesian estimation of categorical distributions under local differential privacy (LDP). The key idea behind AdOBEst-LDP is to enhance the utility of future privatized categorical data by leveraging inference from previously collected privatized data. To achieve this, AdOBEst-LDP uses a new adaptive LDP mechanism to collect privatized data. This LDP mechanism constrains its output to a \emph{subset} of categories that `predicts' the next user's data. By adapting the subset selection process to the past privatized data via Bayesian estimation, the algorithm improves the utility of future privatized data. To quantify utility, we explore various well-known information metrics, including (but not limited to) the Fisher information matrix, total variation distance, and information entropy. For Bayesian estimation, we utilize \emph{posterior sampling} through stochastic gradient Langevin dynamics, a computationally efficient approximate Markov chain Monte Carlo (MCMC) method. We provide a theoretical analysis showing that (i) the posterior distribution of the category probabilities targeted with Bayesian estimation converges to the true probabilities even for approximate posterior sampling, and (ii) AdOBEst-LDP eventually selects the optimal subset for its LDP mechanism with high probability if posterior sampling is performed exactly. We also present numerical results to validate the estimation accuracy of AdOBEst-LDP. Our comparisons show its superior performance against non-adaptive and semi-adaptive competitors across different privacy levels and distributional parameters.

Bayesian Frequency Estimation Under Local Differential Privacy With an Adaptive Randomized Response Mechanism

TL;DR

The paper tackles online frequency estimation for a discrete K-category distribution under local differential privacy by introducing AdOBEst-LDP, which adaptively tunes a randomized response mechanism to maximize information gain over time. The core idea is to constrain privatization to a high-utility subset S via the randomly restricted randomized response (RRRR), with S chosen at each step using posterior samples and several utility metrics. Estimation is performed in a Bayesian framework using scalable posterior sampling (SGLD), and the authors establish posterior consistency and, under exact sampling, asymptotic optimality of subset selection, supplemented by numerical results showing improved accuracy over non-adaptive and semi-adaptive baselines across privacy levels. The work advances practical online DP frequency estimation by combining adaptive mechanism design with scalable Bayesian inference, offering theoretical guarantees and robust empirical performance.

Abstract

Frequency estimation plays a critical role in many applications involving personal and private categorical data. Such data are often collected sequentially over time, making it valuable to estimate their distribution online while preserving privacy. We propose AdOBEst-LDP, a new algorithm for adaptive, online Bayesian estimation of categorical distributions under local differential privacy (LDP). The key idea behind AdOBEst-LDP is to enhance the utility of future privatized categorical data by leveraging inference from previously collected privatized data. To achieve this, AdOBEst-LDP uses a new adaptive LDP mechanism to collect privatized data. This LDP mechanism constrains its output to a \emph{subset} of categories that `predicts' the next user's data. By adapting the subset selection process to the past privatized data via Bayesian estimation, the algorithm improves the utility of future privatized data. To quantify utility, we explore various well-known information metrics, including (but not limited to) the Fisher information matrix, total variation distance, and information entropy. For Bayesian estimation, we utilize \emph{posterior sampling} through stochastic gradient Langevin dynamics, a computationally efficient approximate Markov chain Monte Carlo (MCMC) method. We provide a theoretical analysis showing that (i) the posterior distribution of the category probabilities targeted with Bayesian estimation converges to the true probabilities even for approximate posterior sampling, and (ii) AdOBEst-LDP eventually selects the optimal subset for its LDP mechanism with high probability if posterior sampling is performed exactly. We also present numerical results to validate the estimation accuracy of AdOBEst-LDP. Our comparisons show its superior performance against non-adaptive and semi-adaptive competitors across different privacy levels and distributional parameters.
Paper Structure (35 sections, 25 theorems, 151 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 35 sections, 25 theorems, 151 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

RRRR is $\epsilon$-DP if $\epsilon_{1} \leq \epsilon$ and

Figures (6)

  • Figure 1: AdOBEst-LDP: A framework for Adaptive and Online Bayesian Estimation of categorical distributions with Local Differential Privacy.
  • Figure 2: $\mathbb{P}_{\theta}(Y = X)$ vs $\theta_{i}/\theta_{i+1}$ for all $i = 1, \ldots, K-1$ with $K = 20$. Left: $\epsilon = 1$, Right: $\epsilon = 5$.
  • Figure 3: TV distance in \ref{['eq: error measure']} for $K \in \{10, 20\}$, $\epsilon_{1} = 0.8 \epsilon$
  • Figure 4: TV distance in \ref{['eq: error measure']} for $K \in \{10, 20\}$, $\epsilon_{1} = 0.9 \epsilon$
  • Figure 5: Average cardinalities of the subsets selected by each method, for $K \in \{10, 20\}$, $\epsilon_{1} = 0.8 \epsilon$
  • ...and 1 more figures

Theorems & Definitions (52)

  • Definition 1: Local differential privacy
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • Example 1: Numerical illustration
  • Proposition 4
  • Theorem 3
  • Theorem 4
  • ...and 42 more