Table of Contents
Fetching ...

Balancing Performance and Costs in Best Arm Identification

Michael O. Harding, Kirthevasan Kandasamy

TL;DR

This work introduces a risk-based framework for Best Arm Identification in multi-armed bandits that explicitly balances final performance with sampling costs, enabling adaptive termination without fixed budgets or confidence levels. It formalizes two risk measures—misidentification probability and simple regret—each augmented by a per-sample cost, and develops DBCARE, a dynamically budgeted elimination (racing) algorithm that adapts to problem difficulty. The authors derive phase-transition lower bounds and provide near-matching upper bounds for both two-arm and K-arm settings, demonstrating the method's theoretical near-optimality. Empirical results on simulations and a drug-discovery dataset show DBCARE consistently outperforming traditional fixed-budget and fixed-confidence approaches, highlighting the practical value of cost-aware BAI in real-world decision problems.

Abstract

We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner's goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B testing framework, better than classical fixed budget or confidence settings. We derive theoretical lower bounds for the risk of each of two choices for the performance penalty, the probability of misidentification and the simple regret, and propose an algorithm called DBCARE to match these lower bounds up to polylog factors on nearly all problem instances. We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.

Balancing Performance and Costs in Best Arm Identification

TL;DR

This work introduces a risk-based framework for Best Arm Identification in multi-armed bandits that explicitly balances final performance with sampling costs, enabling adaptive termination without fixed budgets or confidence levels. It formalizes two risk measures—misidentification probability and simple regret—each augmented by a per-sample cost, and develops DBCARE, a dynamically budgeted elimination (racing) algorithm that adapts to problem difficulty. The authors derive phase-transition lower bounds and provide near-matching upper bounds for both two-arm and K-arm settings, demonstrating the method's theoretical near-optimality. Empirical results on simulations and a drug-discovery dataset show DBCARE consistently outperforming traditional fixed-budget and fixed-confidence approaches, highlighting the practical value of cost-aware BAI in real-world decision problems.

Abstract

We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner's goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B testing framework, better than classical fixed budget or confidence settings. We derive theoretical lower bounds for the risk of each of two choices for the performance penalty, the probability of misidentification and the simple regret, and propose an algorithm called DBCARE to match these lower bounds up to polylog factors on nearly all problem instances. We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.

Paper Structure

This paper contains 17 sections, 17 theorems, 59 equations, 5 figures, 1 algorithm.

Key Result

corollary 1

<thm:k-arm-lb-pm>[Corollary of Theorem thm:k-arm-lb-pm, Lower bound on $\mathcal{R}_{\rm MI}$] Fix a gap $\Delta>0$ and the cost $c$ per arm pull. Then, for any policy $\pi\,,$ we have

Figures (5)

  • Figure 1: Illustrations of the lower and upper bounds on the risk for $\mathcal{R}_{\rm MI}$ (on the left) and $\mathcal{R}_{\rm SR}$ (on the right) in the 2-arm case presented throughout § \ref{['sec:2-arm']}, with the performance of the policy which guesses an arm at random without pulling at all (Guess) included as a point of reference.
  • Figure 2: Comparisons between the oracular policy, DBCARE, and fixed budget and confidence algorithms for $\mathcal{R}_{\rm MI}$ and $\mathcal{R}_{\rm SR}$. $Y$-axes are adjusted per setting to highlight problem-specific behavior. Confidence regions represent empirical average risk $\pm$ 2 SE.
  • Figure 3: Comparisons between DBCARE and fixed budget and confidence algorithms for $\mathcal{R}_{\rm MI}$ and $\mathcal{R}_{\rm SR}$ on a drug discovery dataset. $Y$-axes are adjusted per setting to highlight problem-specific behavior. Error bars represent empirical average risk $\pm$2 SE.
  • Figure 4: Comparisons between DBCARE and fixed budget and confidence algorithms for $\mathcal{R}_{\rm MI}$ and $\mathcal{R}_{\rm SR}$ in the $K$-arm 1-sparse setting. $Y$-axes are adjusted per setting to highlight problem-specific behavior. Confidence regions represent empirical average risk $\pm$2 SE.
  • Figure 5: Comparisons between DBCARE and fixed budget and confidence algorithms for $\mathcal{R}_{\rm MI}$ and $\mathcal{R}_{\rm SR}$ in the $K$-arm linear decay setting. $Y$-axes are adjusted per setting to highlight problem-specific behavior. Confidence regions represent empirical average risk $\pm$2 SE.

Theorems & Definitions (32)

  • Example : Advertising
  • corollary 1
  • Proposition 1
  • corollary 2
  • corollary 3
  • Proposition 2
  • corollary 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • ...and 22 more