Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

Hyeong Soo Chang

Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

Hyeong Soo Chang

TL;DR

This work tackles identifying the top-$m$ feasible arms under a constrained multi-armed bandit where sampling is budgeted by a horizon $H$ and each arm yields both reward and cost. It introduces CSAR, a constrained extension of SAR, which blends feasibility testing (via costs) with top-arm selection (via reward gaps) within each phase to achieve exponential convergence in the budget. The authors prove a finite-time bound on the probability of incorrect identification that decays exponentially in $H$, with a complexity term involving the minimum gaps $\Delta_c$ and $\Delta$. They also discuss robustness to tie-breaks, potential ranking guarantees, and practical considerations like tolerance parameters for handling equalities. The results advance efficient identification of feasible designs in constrained simulation-optimization settings, with implications for budgeted rankings and selection under cost constraints.

Abstract

We present an algorithm, "constrained successive accept or reject (CSAR)," for the problem of identifying the subset of top feasible-arms from a given finite set of arms with the limited sampling-budget equal to a given time-horizon when the sequential dynamics of the arms follows the model of a constrained multi-armed bandit. We provide a finite-time upper bound on the probability of the incorrect identification by CSAR that converges to zero with an exponential rate in the sampling-budget.

Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

TL;DR

This work tackles identifying the top-

feasible arms under a constrained multi-armed bandit where sampling is budgeted by a horizon

and each arm yields both reward and cost. It introduces CSAR, a constrained extension of SAR, which blends feasibility testing (via costs) with top-arm selection (via reward gaps) within each phase to achieve exponential convergence in the budget. The authors prove a finite-time bound on the probability of incorrect identification that decays exponentially in

, with a complexity term involving the minimum gaps

and

. They also discuss robustness to tie-breaks, potential ranking guarantees, and practical considerations like tolerance parameters for handling equalities. The results advance efficient identification of feasible designs in constrained simulation-optimization settings, with implications for budgeted rankings and selection under cost constraints.

Abstract

Paper Structure (7 sections, 1 theorem, 36 equations)

This paper contains 7 sections, 1 theorem, 36 equations.

Introduction
Top Feasible-Arm Subset Identification: Algorithm
Preliminaries
Successive Accept or Reject: Issues
Constrained Successive Accept or Reject
Performance analysis
Concluding Remarks

Key Result

Theorem 3.1

Assume that for all $a, a'\in A$, $\mu(a) \neq \mu(a')$ if $a\neq a'$ in $A$ and $C(a) \neq \tau$ for all $a\in A$. Let $\Delta_{\min} = \min \{(\min_{a\in A} \Delta_c(a))^2/2, (\min_{a\in A} \Delta(a))^2/8\}$. Then the probability of the incorrect identification by CSAR is bounded above by

Theorems & Definitions (2)

Theorem 3.1
proof

Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

TL;DR

Abstract

Top Feasible-Arm Subset Identification in Constrained Multi-Armed Bandit with Limited Budget

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (2)