Thompson Sampling Itself is Differentially Private

Tingting Ou; Marco Avella Medina; Rachel Cummings

Thompson Sampling Itself is Differentially Private

Tingting Ou, Marco Avella Medina, Rachel Cummings

TL;DR

This paper proves that Thompson Sampling with Gaussian priors is differentially private in its vanilla form, using Gaussian differential privacy (GDP) to obtain per-round privacy and composition across $T$ rounds, and showing that standard regret bounds still hold. It further introduces two lightweight modifications—pre-pulling each arm $b$ times and scaling the sampling variance by $c$—to achieve a tunable privacy-regret trade-off, with explicit GDP and regret guarantees that depend on $b$ and $c$. The authors provide a unified regret analysis for the modified algorithm and demonstrate, via experiments with Bernoulli and truncated exponential rewards, that intermediate values of $b$ and $c$ yield the best privacy-accuracy balance under a fixed privacy budget. Overall, the work shows that privacy can be achieved in bandit learning without sacrificing baseline performance and offers a practical means to trade privacy for tighter guarantees when needed.

Abstract

In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.

Thompson Sampling Itself is Differentially Private

TL;DR

rounds, and showing that standard regret bounds still hold. It further introduces two lightweight modifications—pre-pulling each arm

times and scaling the sampling variance by

—to achieve a tunable privacy-regret trade-off, with explicit GDP and regret guarantees that depend on

and

. The authors provide a unified regret analysis for the modified algorithm and demonstrate, via experiments with Bernoulli and truncated exponential rewards, that intermediate values of

and

yield the best privacy-accuracy balance under a fixed privacy budget. Overall, the work shows that privacy can be achieved in bandit learning without sacrificing baseline performance and offers a practical means to trade privacy for tighter guarantees when needed.

Abstract

rounds; since the algorithm is unchanged, existing

regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.

Paper Structure (27 sections, 26 theorems, 61 equations, 8 figures, 4 algorithms)

This paper contains 27 sections, 26 theorems, 61 equations, 8 figures, 4 algorithms.

Introduction
Related Work
Model and Preliminaries
Thompson Sampling
Differential Privacy
Thompson Sampling is DP
Improving the Privacy-Regret Trade-off
Privacy Guarantees
Regret Guarantees
Experiments
Bernoulli rewards
Truncated exponential rewards
Conclusion
Omitted Privacy Proofs
Proof of Lemma \ref{['lem.tsgdp-onestep']}
...and 12 more sections

Key Result

Lemma 1

A mechanism $\mathcal{M}$ is $\eta$-GDP if and only if it is $(\epsilon, \delta(\epsilon) )$-DP for all $\epsilon \geq 0$, where $\delta(\epsilon) = \Phi(-\frac{\epsilon}{\eta} + \eta / 2) - e^{\epsilon} \Phi(-\frac{\epsilon}{\eta} - \eta / 2)$, where $\Phi(x) = \frac{1}{\sqrt{2\pi}} \int _{-\infty}

Figures (8)

Figure 1: DP parameter $\epsilon$ as a function of $\delta$ when fixing $T=1000$ and varying $b$ (left) and $c$ (right). Note that the per-round GDP parameter is $\sqrt{\frac{1}{c(b+1)}}$, so the role of $c$ and $b$ are nearly symmetric, leading to nearly identical plots on the left and right. Of course, these parameters can also be varied together.
Figure 2: Empirical regret of Algorithm \ref{['alg.ts.modified']} under varying $(b,c)$ when rewards are Bernoulli distributed.
Figure 3: Total regret at $T=100,000$ of the modified Thompson Sampling algorithm under varying privacy guarantees, when rewards are Bernoulli distributed.
Figure 4: Empirical regret of Algorithm \ref{['alg.ts.modified']} under varying $(b,c)$ when rewards are generated from a truncated exponential distribution.
Figure 5: Total regret at $T=100,000$ of the modified Thompson Sampling algorithm under varying privacy guarantees, when rewards are generated from truncated exponential distributions.
...and 3 more figures

Theorems & Definitions (38)

Definition 1
Definition 2: DRS19
Lemma 1: DRS19
Lemma 2: DRS19
Theorem 1: Informal version of Theorem \ref{['thm.tsdp']}
Theorem 2: AG17
Lemma 3
Lemma 4
Theorem 3
Remark 1
...and 28 more

Thompson Sampling Itself is Differentially Private

TL;DR

Abstract

Thompson Sampling Itself is Differentially Private

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (38)