Table of Contents
Fetching ...

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

Raunak Mukherjee, Sharayu Moharir

TL;DR

This work proposes Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility guarantees and shows it attains optimal dependence on problem parameters up to constant factors in the exponent.

Abstract

We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

TL;DR

This work proposes Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility guarantees and shows it attains optimal dependence on problem parameters up to constant factors in the exponent.

Abstract

We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.
Paper Structure (46 sections, 11 theorems, 119 equations, 2 figures, 1 table, 8 algorithms)

This paper contains 46 sections, 11 theorems, 119 equations, 2 figures, 1 table, 8 algorithms.

Key Result

Theorem 1

Let $\mathcal{C}_{FC}(a;K)$ denotes the set of bandit instances with $K=M \geq 2$ and whose difficulty $H_{FC}$ is upper bounded by some constant $a$. If there exists a bandit instance $\mathcal{G} \in \mathcal{C}_{FC}(a)$ such that the probability any arbitrary learner incorrectly reports the best arm is at least

Figures (2)

  • Figure 1: Performance comparison on synthetic instances. Each subplot shows $\ln(1-\mathrm{Accuracy})$ vs. budget for four algorithms (US, ETC, SR, FCSR). Shaded bands are $\pm1\sigma$ using the delta-method approximation $\mathrm{Var}(\ln(1-\hat{A})) \approx \tfrac{\hat{A}}{N(1-\hat{A})}$ with $N=2000$.
  • Figure 2: Accuracy of different algorithms at two budgets. Error bars show 95% confidence intervals for a Bernoulli mean using the empirical variance $\widehat{\mathrm{Var}}(X)=\hat{p}(1-\hat{p})$, i.e., $\hat{p} \pm 1.96\sqrt{\hat{p}(1-\hat{p})/N}$ with $N=1000$ independent runs.

Theorems & Definitions (19)

  • Remark 1
  • Theorem 1: Lower Bound
  • Theorem 2: Performance of FCSR
  • Lemma 3
  • Lemma 4: Budget compliance
  • proof
  • Lemma 5: Bernoulli TBP Lower Bound
  • proof
  • Lemma 6: Risky Class Lower Bound
  • proof
  • ...and 9 more