Power comparison of sequential testing by betting procedures
Amaury Durand, Olivier Wintenberger
TL;DR
This work develops a comprehensive theory for safe anytime valid sequential testing using test supermartingales, focusing on bounded-mean hypotheses and two betting-based procedures: Hoeffding and Capital, including a two-step capital variant. It provides non-asymptotic power guarantees, introduces variance-constrained alternatives, and derives explicit bounds on rejection times under general (and time-varying) alternatives, including multidimensional settings. The authors extend the framework to composite-null and other functionals, and demonstrate applications to forecaster evaluation and comparative testing, with extensive numerical simulations showing the relative strengths of FTL Hoeffding, EWA/ONS Capital, and 2-step strategies. The results highlight a detection boundary of order $\mathcal{O}(\log n / n)$ for the Capital test under suitable second-order conditions, while dimension and variance considerations guide the choice of betting strategy in practice. Overall, the paper advances safe sequential inference by linking online betting strategies, explicit power guarantees, and practical applications in forecasting and evaluation.
Abstract
In this paper, we derive power guarantees of some sequential tests for bounded mean under general alternatives. We focus on testing procedures using nonnegative supermartingales which are anytime valid and consider alternatives which coincide asymptotically with the null (e.g. vanishing mean) while still allowing to reject in finite time. Introducing variance constraints, we show that the alternative can be broaden while keeping power guarantees for certain second-order testing procedures. We also compare different test procedures in multidimensional setting using characteristics of the rejection times. Finally, we extend our analysis to other functionals as well as testing and comparing forecasters. Our results are illustrated with numerical simulations including bounded mean testing and comparison of forecasters.
