Table of Contents
Fetching ...

Best Arm Identification for Stochastic Rising Bandits

Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli

TL;DR

The authors study fixed-budget Best Arm Identification in Stochastic Rising Bandits, where arm means μ_i(n) are non-decreasing and concave in the number of pulls. They introduce two algorithms: R-UCBE, an optimistic UCB-like method using rising estimators, and R-SR, a phase-based successive-rejects approach that uses pessimistic estimates. Theoretical guarantees show finite-budget identification and simple regret decay, complemented by a lower bound demonstrating the necessity of sufficiently large budgets; the bounds align up to constants for R-SR. Empirical validation on synthetic SRBs and a real-world online model-selection task demonstrates strong performance relative to baselines, with R-UCBE excelling early and R-SR offering competitive long-horizon performance. Together, the results illuminate the learnability and practical viability of BAI in SRBs and chart directions for future budget-aware algorithm design.

Abstract

Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This setting captures a wide range of scenarios in which the available options are learning entities whose performance improves (in expectation) over time (e.g., online best model selection). While previous works addressed the regret minimization problem, this paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs. In this scenario, given a fixed budget of rounds, we are asked to provide a recommendation about the best option at the end of the identification process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process and on the simple regret. Furthermore, we derive a lower bound on the error probability, matched by our R-SR (up to constants), and illustrate how the need for a sufficiently large budget is unavoidable in the SRB setting. Finally, we numerically validate the proposed algorithms in both synthetic and realistic environments.

Best Arm Identification for Stochastic Rising Bandits

TL;DR

The authors study fixed-budget Best Arm Identification in Stochastic Rising Bandits, where arm means μ_i(n) are non-decreasing and concave in the number of pulls. They introduce two algorithms: R-UCBE, an optimistic UCB-like method using rising estimators, and R-SR, a phase-based successive-rejects approach that uses pessimistic estimates. Theoretical guarantees show finite-budget identification and simple regret decay, complemented by a lower bound demonstrating the necessity of sufficiently large budgets; the bounds align up to constants for R-SR. Empirical validation on synthetic SRBs and a real-world online model-selection task demonstrates strong performance relative to baselines, with R-UCBE excelling early and R-SR offering competitive long-horizon performance. Together, the results illuminate the learnability and practical viability of BAI in SRBs and chart directions for future budget-aware algorithm design.

Abstract

Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This setting captures a wide range of scenarios in which the available options are learning entities whose performance improves (in expectation) over time (e.g., online best model selection). While previous works addressed the regret minimization problem, this paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs. In this scenario, given a fixed budget of rounds, we are asked to provide a recommendation about the best option at the end of the identification process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process and on the simple regret. Furthermore, we derive a lower bound on the error probability, matched by our R-SR (up to constants), and illustrate how the need for a sufficiently large budget is unavoidable in the SRB setting. Finally, we numerically validate the proposed algorithms in both synthetic and realistic environments.
Paper Structure (28 sections, 32 theorems, 178 equations, 9 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 32 theorems, 178 equations, 9 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

Under Assumption ass:risingconcave, for every $a > 0$, simultaneously for every arm $i \in \llbracket K \rrbracket$ and number of pulls $n \in \llbracket 0,T \rrbracket$, with probability at least $1 - 2 T K e^{-a/2}$ it holds that: where:

Figures (9)

  • Figure 1: Graphical representation of the pessimistic $\hat{\mu}_i(N_{i,t-1})$ and the optimistic $\check{\mu}_i^T(N_{i,t-1})$ estimators.
  • Figure 2: Expected values $\mu_i(n)$ for the arms.
  • Figure 3: Empirical error rate (100 runs, mean $\pm$ 95% c.i.).
  • Figure 4: Empirical simple regret (100 runs, mean $\pm$ 95% c.i.).
  • Figure 5: Empirical error rate for the R-UCBE at different $a$ (1000 runs, mean $\pm$ 95% c.i.).
  • ...and 4 more figures

Theorems & Definitions (56)

  • Lemma 1: Concentration of $\hat{\mu}_i$
  • Lemma 2: Concentration of $\check{\mu}_i^T$
  • Theorem 4.1
  • Corollary 1
  • Theorem 5.1
  • Corollary 2
  • Remark 5.1: About the minimum time budget $T$
  • Theorem 6.1
  • Theorem 6.2
  • Lemma 3
  • ...and 46 more