Table of Contents
Fetching ...

Non-Asymptotic Analysis of (Sticky) Track-and-Stop

Riccardo Poiani, Martino Bernasconi, Andrea Celli

TL;DR

This work delivers the first non-asymptotic, finite-confidence guarantees for Track-and-Stop (TaS) and Sticky Track-and-Stop (S-TaS) in pure exploration, including both single- and multi-answer problems. By leveraging a concentration-based good-event analysis, a projection-based sampling rule, and an information-theoretic characterization via oracle weights and the $T^{\star}(\boldsymbol{\mu})^{-1}$ quantity, the authors derive explicit upper bounds on the expected stopping time that recover asymptotic optimality as $\delta \to 0$. The results show TaS achieves finite-confidence performance when the correct answer is unique, while S-TaS extends to multi-answer settings with analogous guarantees, incorporating problem-dependent constants such as $T_{\boldsymbol{\mu}}$ and an epsilon-stability radius. These findings provide principled, actionable guarantees for finite-confidence design of pure-exploration strategies in structured bandit models and highlight avenues for tightening bounds in the multi-answer regime.

Abstract

In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $δ$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $δ\to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $ε$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.

Non-Asymptotic Analysis of (Sticky) Track-and-Stop

TL;DR

This work delivers the first non-asymptotic, finite-confidence guarantees for Track-and-Stop (TaS) and Sticky Track-and-Stop (S-TaS) in pure exploration, including both single- and multi-answer problems. By leveraging a concentration-based good-event analysis, a projection-based sampling rule, and an information-theoretic characterization via oracle weights and the quantity, the authors derive explicit upper bounds on the expected stopping time that recover asymptotic optimality as . The results show TaS achieves finite-confidence performance when the correct answer is unique, while S-TaS extends to multi-answer settings with analogous guarantees, incorporating problem-dependent constants such as and an epsilon-stability radius. These findings provide principled, actionable guarantees for finite-confidence design of pure-exploration strategies in structured bandit models and highlight avenues for tightening bounds in the multi-answer regime.

Abstract

In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., -optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.

Paper Structure

This paper contains 29 sections, 13 theorems, 75 equations.

Key Result

Theorem 1

Let $i^{\star}(\cdot)$ be single-valued, and suppose that ass:subgauss and ass:bounded hold. Then, the expected stopping time of TaS satisfies $\mathbb{E}_{\bm\mu}[\tau_\delta] \le 10 K^4 + \frac{\pi^2}{24} + T_0(\delta)$, where $T_0(\delta)$ is given by and $g(t)$ is such that $g(t) = \widetilde{\mathcal{O}}\left( K^2 \sqrt{D_Kt} + \sqrt{D_Kt^{3/2}} \right)$, with $\widetilde{\mathcal{O}}(\cdot

Theorems & Definitions (26)

  • Theorem 1: Non-Asymptotic Bound for TaS
  • Theorem 2: Non-Asymptotic Bound for Sticky-TaS
  • Proposition 1: Lower Bound for Single-Answer Problems garivier2016optimal
  • proof
  • Proposition 2: Lower Bound for Multiple-Answer Problems degenne2019pure
  • proof
  • Lemma 1: Expectation Upper Bound
  • proof
  • Lemma 2: Learning the Equilibrium (TaS)
  • proof
  • ...and 16 more