Table of Contents
Fetching ...

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

Rong J. B. Zhu, Yanqi Qiu

TL;DR

The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem, and establish bounds on the failure probability and the simple regret for the Bayesian BAI problem.

Abstract

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

TL;DR

The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem, and establish bounds on the failure probability and the simple regret for the Bayesian BAI problem.

Abstract

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order , up to logarithmic factors, where represents the budget and denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.
Paper Structure (23 sections, 7 theorems, 92 equations, 4 figures, 1 algorithm)

This paper contains 23 sections, 7 theorems, 92 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3.1

Assume $\mu_k$, for $k\in[K]$, are independently and identically distributed from $\mathcal{N}(\mu_0,\sigma_0^2)$. Then for $\alpha>0$, where $c_K=4\sqrt{2}\ln K\sqrt{\ln \left(\frac{K}{4\sqrt{2\pi}\ln K}\right)} + \frac{2}{\sqrt{2\pi}}$.

Figures (4)

  • Figure 1: Random $\mu_k$. Upper panel: the average performance among 50 experiments. The bars denote the standard error of the mean among 50 experiments. The lower panel: the error difference of the performance of the baselines, $\tt SR$, $\tt SH$, and $\tt UCBE$, respectively, with respect to that of $\tt RUE$ among all 50 experiments.
  • Figure 2: Fixed $\mu_k$. All standard errors are less than $0.016$ and not reported. Each experiment is with budgets $H / 2$, $H$, and $2 H$ (labels of the x axis). The dotted lines denote error probabilities $0$ and $0.2$, just for visual clarity.
  • Figure 3: The error probability of the Two-Stage algorithm with various $q$ values, averaged over 1000 independent executions (results in standard deviations of less than 0.016).
  • Figure 4: The error probability of the different algorithms in Setup F4 with more arms, 40 and 80 arms (left and right subfigures respectively). The results are averaged over 1000 independent executions (all standard errors are less than 0.016 and not reported). For $K=80$, we set $N=400000$ as the maximal budget for the limit of resources when $2H$ is too big.

Theorems & Definitions (9)

  • Theorem 3.1
  • Proposition 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3