Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Riccardo Poiani, Martino Bernasconi, Andrea Celli
TL;DR
This work delivers the first non-asymptotic, finite-confidence guarantees for Track-and-Stop (TaS) and Sticky Track-and-Stop (S-TaS) in pure exploration, including both single- and multi-answer problems. By leveraging a concentration-based good-event analysis, a projection-based sampling rule, and an information-theoretic characterization via oracle weights and the $T^{\star}(\boldsymbol{\mu})^{-1}$ quantity, the authors derive explicit upper bounds on the expected stopping time that recover asymptotic optimality as $\delta \to 0$. The results show TaS achieves finite-confidence performance when the correct answer is unique, while S-TaS extends to multi-answer settings with analogous guarantees, incorporating problem-dependent constants such as $T_{\boldsymbol{\mu}}$ and an epsilon-stability radius. These findings provide principled, actionable guarantees for finite-confidence design of pure-exploration strategies in structured bandit models and highlight avenues for tightening bounds in the multi-answer regime.
Abstract
In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $δ$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $δ\to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $ε$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.
