Constrained Best Arm Identification with Tests for Feasibility
Ting Cai, Kirthevasan Kandasamy
TL;DR
This paper defines a novel constrained best-arm identification problem where each arm has a performance distribution and multiple feasibility tests that can be tested separately. It introduces a LUCB-inspired algorithm that adaptively tests at most one feasibility constraint per arm or the performance, and proves a fixed-confidence guarantee with a tight, gap-dependent lower bound and a matching upper bound. The key contributions are a problem-dependent complexity framework, a $delta$-correct algorithm with asymptotically optimal sample complexity as $delta o0$, and strong empirical results on synthetic and real-world drug-discovery datasets. The work demonstrates that testing feasibility separately can drastically reduce samples, enabling efficient identification of feasible arms with maximal performance in practical settings.
Abstract
Best arm identification (BAI) aims to identify the highest-performance arm among a set of $K$ arms by collecting stochastic samples from each arm. In real-world problems, the best arm needs to satisfy additional feasibility constraints. While there is limited prior work on BAI with feasibility constraints, they typically assume the performance and constraints are observed simultaneously on each pull of an arm. However, this assumption does not reflect most practical use cases, e.g., in drug discovery, we wish to find the most potent drug whose toxicity and solubility are below certain safety thresholds. These safety experiments can be conducted separately from the potency measurement. Thus, this requires designing BAI algorithms that not only decide which arm to pull but also decide whether to test for the arm's performance or feasibility. In this work, we study feasible BAI which allows a decision-maker to choose a tuple $(i,\ell)$, where $i\in [K]$ denotes an arm and $\ell$ denotes whether she wishes to test for its performance ($\ell=0$) or any of its $N$ feasibility constraints ($\ell\in[N]$). We focus on the fixed confidence setting, which is to identify the \textit{feasible} arm with the \textit{highest performance}, with a probability of at least $1-δ$. We propose an efficient algorithm and upper-bound its sample complexity, showing our algorithm can naturally adapt to the problem's difficulty and eliminate arms by worse performance or infeasibility, whichever is easier. We complement this upper bound with a lower bound showing that our algorithm is \textit{asymptotically ($δ\rightarrow 0$) optimal}. Finally, we empirically show that our algorithm outperforms other state-of-the-art BAI algorithms in both synthetic and real-world datasets.
