Table of Contents
Fetching ...

Privacy Guarantees in Posterior Sampling under Contamination

Shenggang Hu, Louis Aslett, Hongsheng Dai, Murray Pollock, Gareth O. Roberts

TL;DR

This paper addresses differential privacy for Bayesian posterior sampling in settings with unbounded observation and parameter spaces by introducing Hubers contamination, which perturbs data with a heavy-tailed density at rate $p_n$. It derives $(\epsilon_n,\delta_n)$-DP guarantees and shows that $p_n \to 0$ yields asymptotically zero privacy loss while preserving information through posterior contraction, with Fisher information satisfying $I_{p_n}(\theta^*) \to I(\theta^*)$. The approach relaxes bounded-space constraints common in DP literature and demonstrates, both theoretically and via simulations, that contamination can improve finite-sample privacy-utility trade-offs without sacrificing asymptotic privacy. This provides a robust, practically implementable framework for privacy-preserving Bayesian inference in large-scale, unbounded settings, with guidance on empirical DP estimation and model verification. The work has implications for privacy-aware Bayesian analyses in governance, tech platforms, and scientific research where data participants' privacy must be rigorously protected while maintaining inferential integrity.

Abstract

In recent years differential privacy has been adopted by tech-companies and governmental agencies as the standard for measuring privacy in algorithms. In this article, we study differential privacy in Bayesian posterior sampling settings. We begin by considering differential privacy in the most common privatization setting in which Laplace or Gaussian noise is simply injected into the output. In an effort to achieve better differential privacy, we consider adopting {\em Huber's contamination model} for use within privacy settings, and replace at random data points with samples from a heavy-tailed distribution ({\em instead} of injecting noise into the output). We derive bounds for the differential privacy level $(ε,δ)$ of our approach, without the need to impose the restriction of having a bounded observation and parameter space which is commonly used by existing approaches and literature. We further consider for our approach the effect of sample size on the privacy level and the convergence rate of $(ε,δ)$ to zero. Asymptotically, our contamination approach is fully private at no cost of information loss. We also provide some examples depicting inference models that our setup is applicable to with a theoretical estimation of the convergence rate, together with some simulations.

Privacy Guarantees in Posterior Sampling under Contamination

TL;DR

This paper addresses differential privacy for Bayesian posterior sampling in settings with unbounded observation and parameter spaces by introducing Hubers contamination, which perturbs data with a heavy-tailed density at rate . It derives -DP guarantees and shows that yields asymptotically zero privacy loss while preserving information through posterior contraction, with Fisher information satisfying . The approach relaxes bounded-space constraints common in DP literature and demonstrates, both theoretically and via simulations, that contamination can improve finite-sample privacy-utility trade-offs without sacrificing asymptotic privacy. This provides a robust, practically implementable framework for privacy-preserving Bayesian inference in large-scale, unbounded settings, with guidance on empirical DP estimation and model verification. The work has implications for privacy-aware Bayesian analyses in governance, tech platforms, and scientific research where data participants' privacy must be rigorously protected while maintaining inferential integrity.

Abstract

In recent years differential privacy has been adopted by tech-companies and governmental agencies as the standard for measuring privacy in algorithms. In this article, we study differential privacy in Bayesian posterior sampling settings. We begin by considering differential privacy in the most common privatization setting in which Laplace or Gaussian noise is simply injected into the output. In an effort to achieve better differential privacy, we consider adopting {\em Huber's contamination model} for use within privacy settings, and replace at random data points with samples from a heavy-tailed distribution ({\em instead} of injecting noise into the output). We derive bounds for the differential privacy level of our approach, without the need to impose the restriction of having a bounded observation and parameter space which is commonly used by existing approaches and literature. We further consider for our approach the effect of sample size on the privacy level and the convergence rate of to zero. Asymptotically, our contamination approach is fully private at no cost of information loss. We also provide some examples depicting inference models that our setup is applicable to with a theoretical estimation of the convergence rate, together with some simulations.
Paper Structure (40 sections, 32 theorems, 280 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 40 sections, 32 theorems, 280 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\bm{X}$,$\bm{Z}$ be neighbouring datasets of size $n$ generated by $\mathbb{P}_{\theta^*}$, the true model under contamination, with density function $k_{p_n}(\cdot;\theta^*)$. Under mild assumptions on the set of density functions $k_p$ and prior $\pi_0$ (Assumptions as:bracket_entropy-as:iden $\mathbb{P}_{\theta^*}$-almost surelly, where $\mathbb{P}_{\pi_n}(S|\bm{X})$ denotes the probabilit

Figures (2)

  • Figure 1: Plots of estimated $(\epsilon_n,\delta_n)$ for three different model setups, with parameter dimension $51$. The average and maximum across all repeats are plotted and distinguished by dot shape. Each colour and line type pair corresponds to the result of one choice of the constant.
  • Figure 2: Plots of estimated $(\epsilon_n,\delta_n)$ for the three setups with parameter dimension $5$.

Theorems & Definitions (99)

  • Theorem 1
  • Definition 1: $(\epsilon,\delta)$-Differentially Private Posterior Sampling
  • Definition 2: Neighbouring Datasets
  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 2
  • proof : Proof for Proposition \ref{['prop:posterior_decomp']}
  • Definition 3: Hellinger Distance
  • Definition 4: $r$-Hellinger Bracketing
  • ...and 89 more