Table of Contents
Fetching ...

Concentrated Differential Privacy for Bandits

Achraf Azize, Debabrota Basu

TL;DR

This work studies bandits under centralized zero-Concentrated Differential Privacy (zCDP) with a trusted controller, introducing three private algorithms—AdaC-UCB, AdaC-GOPE, and AdaC-OFUL—for finite-armed, linear, and contextual bandits. Each algorithm follows a Gaussian mechanism plus adaptive episodic design to achieve $\rho$-Interactive zCDP, with regret costs that are asymptotically negligible compared to non-private regrets; in particular, the privacy cost scales as $\tilde{O}(\rho^{-1/2}\log T)$ for fixed $\rho$. The authors also derive minimax lower bounds via a novel transport-based KL decomposition, showing two privacy regimes and that privacy can be achieved for free when $\rho = \Omega(T^{-1})$. Comprehensive experimental validation confirms the theory across all three settings. Together, these results provide a complete privacy-utility landscape for bandits under zCDP with a centralized DP mechanism, highlighting practical privacy guarantees with minimal performance loss.

Abstract

Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical concern. This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker, and especially the implications of ensuring zero Concentrated Differential Privacy (zCDP). First, we formalise and compare different adaptations of DP to bandits, depending on the considered input and the interaction protocol. Then, we propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings, namely finite-armed bandits, linear bandits, and linear contextual bandits. The three algorithms share a generic algorithmic blueprint, i.e. the Gaussian mechanism and adaptive episodes, to ensure a good privacy-utility trade-off. We analyse and upper bound the regret of these three algorithms. Our analysis shows that in all of these settings, the prices of imposing zCDP are (asymptotically) negligible in comparison with the regrets incurred oblivious to privacy. Next, we complement our regret upper bounds with the first minimax lower bounds on the regret of bandits with zCDP. To prove the lower bounds, we elaborate a new proof technique based on couplings and optimal transport. We conclude by experimentally validating our theoretical results for the three different settings of bandits.

Concentrated Differential Privacy for Bandits

TL;DR

This work studies bandits under centralized zero-Concentrated Differential Privacy (zCDP) with a trusted controller, introducing three private algorithms—AdaC-UCB, AdaC-GOPE, and AdaC-OFUL—for finite-armed, linear, and contextual bandits. Each algorithm follows a Gaussian mechanism plus adaptive episodic design to achieve -Interactive zCDP, with regret costs that are asymptotically negligible compared to non-private regrets; in particular, the privacy cost scales as for fixed . The authors also derive minimax lower bounds via a novel transport-based KL decomposition, showing two privacy regimes and that privacy can be achieved for free when . Comprehensive experimental validation confirms the theory across all three settings. Together, these results provide a complete privacy-utility landscape for bandits under zCDP with a centralized DP mechanism, highlighting practical privacy guarantees with minimal performance loss.

Abstract

Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical concern. This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker, and especially the implications of ensuring zero Concentrated Differential Privacy (zCDP). First, we formalise and compare different adaptations of DP to bandits, depending on the considered input and the interaction protocol. Then, we propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings, namely finite-armed bandits, linear bandits, and linear contextual bandits. The three algorithms share a generic algorithmic blueprint, i.e. the Gaussian mechanism and adaptive episodes, to ensure a good privacy-utility trade-off. We analyse and upper bound the regret of these three algorithms. Our analysis shows that in all of these settings, the prices of imposing zCDP are (asymptotically) negligible in comparison with the regrets incurred oblivious to privacy. Next, we complement our regret upper bounds with the first minimax lower bounds on the regret of bandits with zCDP. To prove the lower bounds, we elaborate a new proof technique based on couplings and optimal transport. We conclude by experimentally validating our theoretical results for the three different settings of bandits.
Paper Structure (65 sections, 36 theorems, 200 equations, 6 figures, 2 tables, 4 algorithms)

This paper contains 65 sections, 36 theorems, 200 equations, 6 figures, 2 tables, 4 algorithms.

Key Result

Proposition 1

For any policy $\pi$, we have that where $\Pi_{\text{Table}}^{(\epsilon, \delta)}$ and $\Pi_{\text{View}}^{(\epsilon, \delta)}$ are the class of all policies verifying $(\epsilon, \delta)$-Table DP and $(\epsilon, \delta)$-View DP, respectively.

Figures (6)

  • Figure 1: Table DP
  • Figure 2: View DP
  • Figure 3: Sequential interaction between the policy, an adversary, and a table of rewards.
  • Figure 4: For each bandit setting, the left figure represents the evolution of the difference between the private and non-private regret w.r.t. the privacy budget $\rho$. The right figure represents the evolution of the price of privacy (PoP) w.r.t. the time step.
  • Figure 5: Evolution of the regret over time for $\mathsf{AdaC\text{-}GOPE}$ and Adar-GOPE-Var for different values of the privacy budget $\rho$
  • ...and 1 more figures

Theorems & Definitions (78)

  • Example 1: DoctorBandit
  • Definition 1: $(\epsilon, \delta)$-DP dwork2014algorithmic and $\rho$-zCDP ZeroDP
  • Definition 2
  • Remark 1
  • Definition 3: Table DP and View DP
  • Proposition 1: Relation between Table DP and View DP
  • Definition 4: Interactive DP
  • Remark 2
  • Proposition 2
  • Theorem 1: Group Privacy for $\rho$-Interactive DP
  • ...and 68 more