Differentially Private Best-Arm Identification

Achraf Azize; Marc Jourdan; Aymen Al Marjani; Debabrota Basu

Differentially Private Best-Arm Identification

Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu

TL;DR

The paper tackles the problem of Best-Arm Identification under fixed confidence with differential privacy, addressing both ε-local and ε-global models. It derives two regime-based lower bounds that reveal privacy-utility trade-offs governed by TV and KL-type characteristic times, and then designs private Top-Two variants CTB-TT (local DP) and AdaP-TT/AdaP-TT* (global DP) that match these lower bounds up to constants. The CTB-TT algorithm leverages a private Convert-To-Bernoulli estimator, while AdaP-TT uses a adaptive doubling-forgetting private mean estimator, with AdaP-TT* further refining transportation costs to align with the global-DP lower bound. Experimental results validate the theoretical findings, showing clear high- and low-privacy regimes and demonstrating the practical competitiveness of the proposed methods against DP-SE and non-private baselines. The work provides a rigorous privacy-utility framework for FC-BAI and offers scalable, private algorithms with provable guarantees for data-sensitive sequential experimentation.

Abstract

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence in both the local and central models, i.e. $ε$-local and $ε$-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive lower bounds on the sample complexity of any $δ$-correct BAI algorithm satisfying $ε$-global DP or $ε$-local DP. Our lower bounds suggest the existence of two privacy regimes. In the high-privacy regime, the hardness depends on a coupled effect of privacy and novel information-theoretic quantities involving the Total Variation. In the low-privacy regime, the lower bounds reduce to the non-private lower bounds. We propose $ε$-local DP and $ε$-global DP variants of a Top Two algorithm, namely CTB-TT and AdaP-TT*, respectively. For $ε$-local DP, CTB-TT is asymptotically optimal by plugging in a private estimator of the means based on Randomised Response. For $ε$-global DP, our private estimator of the mean runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. By adapting the transportation costs, the expected sample complexity of AdaP-TT* reaches the asymptotic lower bound up to multiplicative constants.

Differentially Private Best-Arm Identification

TL;DR

Abstract

-local and

-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive lower bounds on the sample complexity of any

-correct BAI algorithm satisfying

-global DP or

-local DP. Our lower bounds suggest the existence of two privacy regimes. In the high-privacy regime, the hardness depends on a coupled effect of privacy and novel information-theoretic quantities involving the Total Variation. In the low-privacy regime, the lower bounds reduce to the non-private lower bounds. We propose

-local DP and

-global DP variants of a Top Two algorithm, namely CTB-TT and AdaP-TT*, respectively. For

-local DP, CTB-TT is asymptotically optimal by plugging in a private estimator of the means based on Randomised Response. For

-global DP, our private estimator of the mean runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. By adapting the transportation costs, the expected sample complexity of AdaP-TT* reaches the asymptotic lower bound up to multiplicative constants.

Paper Structure (51 sections, 35 theorems, 176 equations, 5 figures, 1 table, 6 algorithms)

This paper contains 51 sections, 35 theorems, 176 equations, 5 figures, 1 table, 6 algorithms.

Introduction
Contributions
Lower Bounds
Algorithm Design
Upper Bounds
Outline
Differential Privacy and Best-Arm Identification
Background: Differential Privacy
Background: Best Arm Identification in the Fixed-Confidence Setting
The Best Arm Identification Problem
Lower Bound on the Expected Sample Complexity
The TTUCB Meta-algorithm
Problem Statement: FC-BAI with DP
Local DP FC-BAI
Global DP BAI
...and 36 more sections

Key Result

Theorem 2

Let $f: \mathcal{X} \rightarrow \mathbb{R}^d$ be an algorithm with sensitivity $s(f) \mathrel{\triangleq} \underset{\substack{\mathcal{D}, \mathcal{D'} \text{ s.t }|\mathcal{D} - \mathcal{D'}|_{\mathrm{Hamming}} = 1}}{\max} \left\|f(\mathcal{D}) - f(\mathcal{D'})\right\|_1$, where $\left\|\cdot\righ

Figures (5)

Figure 1: Empirical stopping time $\tau_{\delta}$ (mean $\pm$ std. over 1000 runs, $\delta = 10^{-2}$) with respect to the privacy budget $\epsilon$ for $\epsilon$-local DP on Bernoulli instance $\mu_{1}$ (left) and $\mu_{2}$ (right). The shaded vertical line separates the two privacy regimes.
Figure 2: Empirical stopping time $\tau_{\delta}$ (mean $\pm$ std. over 1000 runs) with respect to the privacy budget $\epsilon$ for $\epsilon$-global DP on Bernoulli instance $\mu_{1}$ (left) and $\mu_{2}$ (right). The shaded vertical line separates the two privacy regimes.
Figure 3: Evolution of the stopping time $\tau$ (mean $\pm$ std. over 1000 runs) of CTB-TT and TTUCB with respect to the privacy budget $\epsilon$ for $\delta = 10^{-2}$ on different Bernoulli instances. The shaded vertical line separates the two privacy regimes.
Figure 4: Evolution of the stopping time $\tau$ (mean $\pm$ std. over 1000 runs) of Imp-$\mathsf{AdaP\text{-}TT}$, $\mathsf{AdaP\text{-}TT}$, DP-SE, and TTUCB with respect to the privacy budget $\epsilon$ for $\delta = 10^{-2}$ on different Bernoulli instances. The shaded vertical line separates the two privacy regimes. Both the $x$-axis and $y$-axis are in logarithmic scale.
Figure 5: Evolution of the stopping time $\tau$ (mean $\pm$ std. over 1000 runs) of $\mathsf{AdaP\text{-}TT}^\star$, $\mathsf{AdaP\text{-}TT}$, DP-SE, and TTUCB with respect to the privacy budget $\epsilon$ for $\delta = 10^{-2}$ on different Bernoulli instances. The shaded vertical line separates the two privacy regimes. Only the $x$-axis is in logarithmic scale.

Theorems & Definitions (45)

Example 1: Adaptive dose finding trial
Definition 1: $(\epsilon, \delta)$-DP dwork2014algorithmic
Theorem 2: Laplace mechanism, Theorem 3.6 dwork2014algorithmic
Definition 3: $\epsilon$-local DP duchi2013local
Lemma 4: garivier2016optimal
Remark 5
Definition 6: $\epsilon$-local DP for BAI
Definition 7: $\epsilon$-global DP for BAI
Remark 8
Theorem 9
...and 35 more

Differentially Private Best-Arm Identification

TL;DR

Abstract

Differentially Private Best-Arm Identification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (45)