Table of Contents
Fetching ...

Cost Aware Best Arm Identification

Kellen Kanarios, Qining Zhang, Lei Ying

TL;DR

Cost-Aware Best Arm Identification (CABAI) extends traditional best-arm identification by associating a testing cost with each arm and aiming to identify the arm with the highest reward while minimizing expected testing cost. The authors derive a problem-dependent lower bound characterized by $T^*(\boldsymbol{\mu})$ and optimal weight $\boldsymbol{w}^*$, and propose CTAS, an asymptotically cost-optimal sampling framework with a Chernoff-style stopping rule. To address computational concerns, they introduce Chernoff Overlap (CO), a low-complexity square-root-cost-weighted algorithm with $\delta$-PAC guarantees and provable optimality in the two-armed Gaussian case. Empirical results show that cost heterogeneity critically affects sampling and that CO provides fast, robust performance across distributions, while CTAS achieves theoretical optimality at the cost of higher computation. Collectively, the work offers cost-aware strategies for efficient arm identification in two-phase product development pipelines and lays groundwork for cost-aware regret and ETC extensions.

Abstract

In this paper, we study a best arm identification problem with dual objects. In addition to the classic reward, each arm is associated with a cost distribution and the goal is to identify the largest reward arm using the minimum expected cost. We call it \emph{Cost Aware Best Arm Identification} (CABAI), which captures the separation of testing and implementation phases in product development pipelines and models the objective shift between phases, i.e., cost for testing and reward for implementation. We first derive a theoretical lower bound for CABAI and propose an algorithm called $\mathsf{CTAS}$ to match it asymptotically. To reduce the computation of $\mathsf{CTAS}$, we further propose a simple algorithm called \emph{Chernoff Overlap} (CO), based on a square-root rule, which we prove is optimal in simplified two-armed models and generalizes well in numerical experiments. Our results show that (i) ignoring the heterogeneous action cost results in sub-optimality in practice, and (ii) simple algorithms can deliver near-optimal performance over a wide range of problems.

Cost Aware Best Arm Identification

TL;DR

Cost-Aware Best Arm Identification (CABAI) extends traditional best-arm identification by associating a testing cost with each arm and aiming to identify the arm with the highest reward while minimizing expected testing cost. The authors derive a problem-dependent lower bound characterized by and optimal weight , and propose CTAS, an asymptotically cost-optimal sampling framework with a Chernoff-style stopping rule. To address computational concerns, they introduce Chernoff Overlap (CO), a low-complexity square-root-cost-weighted algorithm with -PAC guarantees and provable optimality in the two-armed Gaussian case. Empirical results show that cost heterogeneity critically affects sampling and that CO provides fast, robust performance across distributions, while CTAS achieves theoretical optimality at the cost of higher computation. Collectively, the work offers cost-aware strategies for efficient arm identification in two-phase product development pipelines and lays groundwork for cost-aware regret and ETC extensions.

Abstract

In this paper, we study a best arm identification problem with dual objects. In addition to the classic reward, each arm is associated with a cost distribution and the goal is to identify the largest reward arm using the minimum expected cost. We call it \emph{Cost Aware Best Arm Identification} (CABAI), which captures the separation of testing and implementation phases in product development pipelines and models the objective shift between phases, i.e., cost for testing and reward for implementation. We first derive a theoretical lower bound for CABAI and propose an algorithm called to match it asymptotically. To reduce the computation of , we further propose a simple algorithm called \emph{Chernoff Overlap} (CO), based on a square-root rule, which we prove is optimal in simplified two-armed models and generalizes well in numerical experiments. Our results show that (i) ignoring the heterogeneous action cost results in sub-optimality in practice, and (ii) simple algorithms can deliver near-optimal performance over a wide range of problems.
Paper Structure (36 sections, 19 theorems, 141 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 36 sections, 19 theorems, 141 equations, 2 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Let $\delta\in(0,1)$. For any $\delta$-PAC algorithm and any bandit model $\boldsymbol{\mu} \in \mathcal{M}$, we have: where $T^*(\boldsymbol{\mu})$ is the instance dependent constant satisfying:

Figures (2)

  • Figure 1: Change in overlap upon pulling arm $2$, where the ellipsoids stand for confidence intervals. Left: wider confidence interval for $\mu_2$. Right: reduced confidence interval upon pulling arm 2.
  • Figure 2: Results averaged over $1000$ trajectories with fixed confidence level $\delta = 10^{-6}$. In (a), we have the average number of arm pulls at each time $t$. In (b) we have the statistics regarding total cost for these trajectories. This figure was generated with $\boldsymbol{\mu}_1 = [1.5, 1, .5]$ and $\boldsymbol{\mu}_2 = [.9, .6, .3]$ with $\boldsymbol{c} = [1, .1, .01]$, where $\boldsymbol{\mu}_1$ and $\boldsymbol{\mu}_2$ follow a Bernoulli and Poisson distribution respectively.

Theorems & Definitions (40)

  • Definition 1
  • Theorem 1
  • Corollary 1
  • Remark 1
  • Proposition 1: $\delta$-PAC
  • Theorem 2: Expected Upper Bound
  • Proposition 2: $\delta$-PAC
  • Theorem 3
  • Lemma 1: Cost Decomposition Lemma
  • proof : Proof of Theorem \ref{['thm:general-lower-bound']}
  • ...and 30 more