Table of Contents
Fetching ...

Best Arm Identification with Possibly Biased Offline Data

Le Yang, Vincent Y. F. Tan, Wang Chi Cheung

TL;DR

This work studies best arm identification (BAI) under potentially biased offline data in a fixed-confidence setting. It shows an impossibility result for fully adaptive offline data usage without a bound on the offline-online bias and proposes LUCB-H, a bias-aware LUCB-based policy that forms mixed confidence bounds using a finite bias bound vector $V$. LUCB-H achieves a near-optimal stopping time that matches standard LUCB when offline data is misleading and can substantially reduce online samples when offline data is informative, supported by an instance-dependent lower bound that matches the upper bound in certain regimes. Numerical experiments on a five-arm Gaussian bandit demonstrate LUCB-H’s adaptive behavior: it discards misleading offline data and aggressively exploits helpful data, with robustness to moderate misspecifications of the bias bound.

Abstract

We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction to balance offline and online data within the LUCB framework. Theoretical analysis shows that LUCB-H matches the sample complexity of standard LUCB when offline data is misleading and significantly outperforms it when offline data is helpful. We also derive an instance-dependent lower bound that matches the upper bound of LUCB-H in certain scenarios. Numerical experiments further demonstrate the robustness and adaptability of LUCB-H in effectively incorporating offline data.

Best Arm Identification with Possibly Biased Offline Data

TL;DR

This work studies best arm identification (BAI) under potentially biased offline data in a fixed-confidence setting. It shows an impossibility result for fully adaptive offline data usage without a bound on the offline-online bias and proposes LUCB-H, a bias-aware LUCB-based policy that forms mixed confidence bounds using a finite bias bound vector . LUCB-H achieves a near-optimal stopping time that matches standard LUCB when offline data is misleading and can substantially reduce online samples when offline data is informative, supported by an instance-dependent lower bound that matches the upper bound in certain regimes. Numerical experiments on a five-arm Gaussian bandit demonstrate LUCB-H’s adaptive behavior: it discards misleading offline data and aggressively exploits helpful data, with robustness to moderate misspecifications of the bias bound.

Abstract

We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction to balance offline and online data within the LUCB framework. Theoretical analysis shows that LUCB-H matches the sample complexity of standard LUCB when offline data is misleading and significantly outperforms it when offline data is helpful. We also derive an instance-dependent lower bound that matches the upper bound of LUCB-H in certain scenarios. Numerical experiments further demonstrate the robustness and adaptability of LUCB-H in effectively incorporating offline data.

Paper Structure

This paper contains 14 sections, 6 theorems, 45 equations, 10 figures, 1 algorithm.

Key Result

Proposition 3.1

Consider instance $I_P$ as described above. Suppose the offline sample sizes satisfy $T_S(1) \in \mathbb{N}$ and there exists $\epsilon>0$ such that $T_S(2) \leq\frac{C}{2}(\delta^{-2\beta} - \delta^{-2\beta+\epsilon})$, where $C > 0$ is an absolute constant and $\delta \in (0,1)$. Then, for any $\d on instance $I_P$, there exists an instance $I_Q$ defined as such that as $\delta\to 0^+$, the pol

Figures (10)

  • Figure 1: Evolution of $\mathbb{E}[\tau_{\delta}]$ with $\log(1/\delta)$ in group $1$
  • Figure 2: Evolution of $\mathbb{E}[\tau_{\delta}]$ with $\log(1/\delta)$ in group $2$
  • Figure 3: Evolution of $\mathbb{E}[\tau_{\delta}]$ with $T_{S}$ in group $1$
  • Figure 4: Evolution of $\mathbb{E}[\tau_{\delta}]$ with $T_{S}$ in group $2$
  • Figure 5: Evolution of $\mathbb{E}[\tau_{\delta}]$ with $V$ in group $1$
  • ...and 5 more figures

Theorems & Definitions (7)

  • Proposition 3.1
  • Theorem 4.1
  • Theorem 5.1
  • Remark 5.1
  • Proposition A.1
  • Lemma A.1
  • Lemma C.1