Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget
Hyeong Soo Chang
TL;DR
This work tackles identifying the top-$m$ feasible arms under a constrained multi-armed bandit where sampling is budgeted by a horizon $H$ and each arm yields both reward and cost. It introduces CSAR, a constrained extension of SAR, which blends feasibility testing (via costs) with top-arm selection (via reward gaps) within each phase to achieve exponential convergence in the budget. The authors prove a finite-time bound on the probability of incorrect identification that decays exponentially in $H$, with a complexity term involving the minimum gaps $\Delta_c$ and $\Delta$. They also discuss robustness to tie-breaks, potential ranking guarantees, and practical considerations like tolerance parameters for handling equalities. The results advance efficient identification of feasible designs in constrained simulation-optimization settings, with implications for budgeted rankings and selection under cost constraints.
Abstract
We present an algorithm, "constrained successive accept or reject (CSAR)," for the problem of identifying the subset of top feasible-arms from a given finite set of arms with the limited sampling-budget equal to a given time-horizon when the sequential dynamics of the arms follows the model of a constrained multi-armed bandit. We provide a finite-time upper bound on the probability of the incorrect identification by CSAR that converges to zero with an exponential rate in the sampling-budget.
