Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

Raunak Mukherjee; Sharayu Moharir

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

Raunak Mukherjee, Sharayu Moharir

TL;DR

This work proposes Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility guarantees and shows it attains optimal dependence on problem parameters up to constant factors in the exponent.

Abstract

We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

TL;DR

Abstract

Paper Structure (46 sections, 11 theorems, 119 equations, 2 figures, 1 table, 8 algorithms)

This paper contains 46 sections, 11 theorems, 119 equations, 2 figures, 1 table, 8 algorithms.

Introduction
Our Contributions.
Related Work.
PROBLEM FORMULATION
Constrained Grouped Bandit Setting.
Objective.
LOWER BOUND
Complexity Parameter Definitions.
FEASIBILITY CONSTRAINED SUCCESSIVE REJECTS
Notation
Algorithm description
Theoretical Analysis
NUMERICAL ANALYSIS
Experimental Setup
Baselines
...and 31 more sections

Key Result

Theorem 1

Let $\mathcal{C}_{FC}(a;K)$ denotes the set of bandit instances with $K=M \geq 2$ and whose difficulty $H_{FC}$ is upper bounded by some constant $a$. If there exists a bandit instance $\mathcal{G} \in \mathcal{C}_{FC}(a)$ such that the probability any arbitrary learner incorrectly reports the best arm is at least

Figures (2)

Figure 1: Performance comparison on synthetic instances. Each subplot shows $\ln(1-\mathrm{Accuracy})$ vs. budget for four algorithms (US, ETC, SR, FCSR). Shaded bands are $\pm1\sigma$ using the delta-method approximation $\mathrm{Var}(\ln(1-\hat{A})) \approx \tfrac{\hat{A}}{N(1-\hat{A})}$ with $N=2000$.
Figure 2: Accuracy of different algorithms at two budgets. Error bars show 95% confidence intervals for a Bernoulli mean using the empirical variance $\widehat{\mathrm{Var}}(X)=\hat{p}(1-\hat{p})$, i.e., $\hat{p} \pm 1.96\sqrt{\hat{p}(1-\hat{p})/N}$ with $N=1000$ independent runs.

Theorems & Definitions (19)

Remark 1
Theorem 1: Lower Bound
Theorem 2: Performance of FCSR
Lemma 3
Lemma 4: Budget compliance
proof
Lemma 5: Bernoulli TBP Lower Bound
proof
Lemma 6: Risky Class Lower Bound
proof
...and 9 more

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

TL;DR

Abstract

Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)