Table of Contents
Fetching ...

On the Problem of Best Arm Retention

Houshuang Chen, Yuchen He, Chihao Zhang

TL;DR

This work tackles Best Arm Retention (BAR), a memory-conscious extension of Best Arm Identification that aims to retain $m$ arms from $n$ with the best arm included, and it introduces the $r$-BAR relaxation with an expected gap target $r$. It provides a simple $\varepsilon$-PAC BAR algorithm with a near-tight $\Theta\left(\frac{n-m}{\varepsilon^2}\log\frac{n-m}{n\delta}\right)$ sample bound via MedianElimination, and establishes a likelihood-ratio-based lower bound showing near-tightness across broad parameter ranges. For $r$-BAR, it derives a tight $\Theta\left(\frac{(n-m)^3}{(nr)^2}\right)$ sample complexity and a corresponding regret complexity of $\Theta\left(\frac{(n-m)^2}{nr}\right)$ (up to a adaptivity factor when $m$ is large), along with explicit algorithms that achieve these rates and a lower bound that nearly matches them. The paper also highlights a fundamental difference between the sample complexity and regret objectives, proposes adaptive, instance-aware strategies (e.g., MirrorDescent-based FindBest), and ends with a conjecture that the regret bounds are tight, inviting further research into optimal BAR procedures and instance-dependent analyses.

Abstract

This paper presents a comprehensive study on the problem of Best Arm Retention (BAR), which has recently found applications in streaming algorithms for multi-armed bandits. In the BAR problem, the goal is to retain $m$ arms with the best arm included from $n$ after some trials, in stochastic multi-armed bandit settings. We first investigate pure exploration for the BAR problem under different criteria, and then minimize the regret with specific constraints, in the context of further exploration in streaming algorithms. - We begin by revisiting the lower bound for the $(\varepsilon,δ)$-PAC algorithm for Best Arm Identification (BAI) and adapt the classical KL-divergence argument to derive optimal bounds for $(\varepsilon,δ)$-PAC algorithms for BAR. - We further study another variant of the problem, called $r$-BAR, which requires the expected gap between the best arm and the optimal arm retained is less than $r$. We prove tight sample complexity for the problem. - We explore the regret minimization problem for $r$-BAR and develop algorithm beyond pure exploration. We conclude with a conjecture on the optimal regret in this setting.

On the Problem of Best Arm Retention

TL;DR

This work tackles Best Arm Retention (BAR), a memory-conscious extension of Best Arm Identification that aims to retain arms from with the best arm included, and it introduces the -BAR relaxation with an expected gap target . It provides a simple -PAC BAR algorithm with a near-tight sample bound via MedianElimination, and establishes a likelihood-ratio-based lower bound showing near-tightness across broad parameter ranges. For -BAR, it derives a tight sample complexity and a corresponding regret complexity of (up to a adaptivity factor when is large), along with explicit algorithms that achieve these rates and a lower bound that nearly matches them. The paper also highlights a fundamental difference between the sample complexity and regret objectives, proposes adaptive, instance-aware strategies (e.g., MirrorDescent-based FindBest), and ends with a conjecture that the regret bounds are tight, inviting further research into optimal BAR procedures and instance-dependent analyses.

Abstract

This paper presents a comprehensive study on the problem of Best Arm Retention (BAR), which has recently found applications in streaming algorithms for multi-armed bandits. In the BAR problem, the goal is to retain arms with the best arm included from after some trials, in stochastic multi-armed bandit settings. We first investigate pure exploration for the BAR problem under different criteria, and then minimize the regret with specific constraints, in the context of further exploration in streaming algorithms. - We begin by revisiting the lower bound for the -PAC algorithm for Best Arm Identification (BAI) and adapt the classical KL-divergence argument to derive optimal bounds for -PAC algorithms for BAR. - We further study another variant of the problem, called -BAR, which requires the expected gap between the best arm and the optimal arm retained is less than . We prove tight sample complexity for the problem. - We explore the regret minimization problem for -BAR and develop algorithm beyond pure exploration. We conclude with a conjecture on the optimal regret in this setting.

Paper Structure

This paper contains 19 sections, 16 theorems, 42 equations, 6 algorithms.

Key Result

Theorem 1

For any $(\varepsilon,\delta)$-PAC algorithm for BAR satisfying $\varepsilon\leq \frac{1}{8}$ and $\delta\leq \frac{n-m}{n}(1-\beta)$, where $\beta\in (0,1)$ is a universal constant, the sample complexity is

Theorems & Definitions (28)

  • Theorem 1
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • proof
  • Proposition 5: LG21, Theorem 11
  • Proposition 6: EMM06, Theorem 10
  • Theorem 7: Part of Theorem \ref{['thm:bar']}
  • proof
  • Theorem 8
  • ...and 18 more